Diggin' technology every day

September 4, 2015

Containers – my experiences good and bad

Filed under: linux — Tags: — Nate @ 6:46 pm

This is a followup post to an earlier post I had responding to container hype (more specifically perhaps Docker hype).

I want to give some of my (albeit limited) real-world experience with containers(that play a part in generating well north of two hundred million a year in revenue) the good and the bad, and how I decided to make use of them and where I see using them in the future.

I wrote a lot of this in a comment on el reg not too long ago so thought to more formalize it here so I can refer people to it if needed. Obviously I have much better control over formatting on a blog than a comment box.

The case for containers

The initial use case for containers at my organization was very targeted at one specific web application. From a server perspective up until this point we were 100% virtualized under VMware ESXi Enterprise Plus.

This web application drives the core e-commerce engine of the business, it is a commercial product (though an open source version exists), and the license cost is north of $10,000/year per installation. So for example if you have a VMware server with 5 VMs on it, each running this application in production you will pay north of $50,000/year in license fees/support for those 5 VMs. There is no license model where they license per CPU, or per CPU core, or per physical host(at this time anyway).

The application can be very CPU hungry, and in the earliest days we ran the application stack in the Amazon cloud. In early 2012 we moved out and ran it in house on top of VMware. We allocated something like 4 vCPUs per web server running this application. We had 4 web servers active at any given time, though we had the ability to double that capacity very quickly if required.

Farm based software deployment

It was decided early on before I joined the company that the deployment model for the applications would be “farm” based. That is we would have two “banks” of servers “A” and “B”. Generally one bank would be active at any given time, and to deploy code we would deploy to the inactive servers and then “flip farms”, basically change load balancing routing to point users at the servers with the new code. While in Amazon the line of thinking is we would “spin up” (on demand) new servers, deploy to them, make them live, then terminate the original servers(to save $$). Rinse & Repeat. Reality set in and this never happened, the farms stayed up all the time (short of Amazon failures which were very frequent(relative to current failures anyway)).

This model of farm deployments is the same model used at my previous company (with the original Ops director being the same person so not a big surprise). Obviously it’s not the only way to deploy (it’s the only two places I’ve worked at that deploy in this manor), but it works fine. My focus really is not on application deployment so I have not had an interest in pushing to use another model.

When we moved to the data center, the cost of managing both farms was not much, inactive farms used very little CPU, disk space was a non issue(I have perfected log rotation and retention over the years combined with LVM disk management to maximize efficiency of thin provisioning on 3PAR, it runs really well). Memory was a factor to some degree but at the end of the day it wasn’t a big deal.

Having the 2nd farm always running had another benefit. We could, on very short notice activate the 2nd farm and essentially double our production server capacity. We did this(and continue to) for high load events. Obviously it does impact the ability to deploy code when in this situation but we adapted to that a long long time ago.

One big benefit of the farm approach is it makes rollbacks of application code very quick(10-30 seconds). The applications involved generally aren’t expected to operate with mixed versions of the application running simultaneously(obviously depends on the extent of the changes).

The process today which manages activating both “farms” simultaneously does perform a check of the application code on both farms and will not allow them both to go active if they do not match.

Scaling the application

As a year or two passed the CPU requirements of the application grew (in part due to traffic growth also in part due to bad code etc). We found ourselves during our high traffic time two years ago keeping both “farms” active for months at a time(making short exceptions for code deployment), to try to ensure we had sufficient capacity. This worked, but it wasn’t the most cost effective model to grow to, as traffic continued to rise, I wanted something (much) faster without breaking the bank.

Moving to physical hardware

Although we were 100% virtualized I did think a good strategy for this application was to move to physical hardware, for two main reasons:

  • Eliminate any overhead from hypervisor
  • I wanted to dedicate entire physical servers to this application, paying VMware license fees for basically a single application on one server seemed like a waste of $

I did not entertain the option of using one of the free hypervisors for four reasons:

  • Didn’t want overhead from the hypervisor
  • Nobody in the organization had solid experience with any other hypervisor
  • Didn’t want another technology stack to manage separately, just needless complexity
  • Xen and KVM aren’t nearly as solid as VMware, just not enough to consider using them for this use case anyway.

So my line of thinking early on wasn’t containers, it was more likely a single OS image, with custom application configurations, and directory structures, two apache instances (one for each “farm”) on each server, and the load balancer would just switch between the apache instances when “flipping farms”. I have done this before to some extent as mentioned in the previous article on containers. It didn’t take long for me to kinda-sorta rule this out as a good idea for a couple of reasons:

  • The application configuration was going to be somewhat unique relative to all other environments (unless we changed all of them which was possible, quite a bit more work though)
  • Not entirely sure how easy it was going to be to get the application to run from two different paths and ensure that it operates correctly (maybe it would of been easy I don’t know)

So at some point the ideas of containers hit me and I decided to explore that as an option.

Benefit of containers for this use case

  • LXC being built into our existing Ubuntu 12.04 LTS operating system
  • Easily runs on physical hardware
  • “Partitions” the operating system into multiple instances so that they have their own directory structures eliminating the need to have to reconfigure applications to work from a funky layout.
  • Allows me to scale a single container to the entire physical CPU horsepower of the server automatically, while limiting memory usage so the physical host does not run out of memory
  • Allows me to maintain two containers on each host (one for each “farm”), and eliminates the need to “activate both farms” for capacity since all of the capacity is already available.
  • Eliminates $10,000+ fee of VMware licensing
  • Slashes $10,000+/year fee of application by slashing the number of systems required to run it in production now and in the future.
  • Eliminates overhead of hypervisor
  • Eliminates dependency on SAN storage
  • Massive increase in available capacity, roughly 8 X the capacity of the previously virtualized “farm” of servers (or 4X the capacity of both farms combined). Means years of room to grow into without having to think about it.

Limited use case

This is a very targeted deployment for containers. This is a highly available production web application where each server is basically an exact copy of each other. Obviously this means if one physical host or container fails the others continue processing without skipping a beat. There are three physical hosts in this case (HP DL380Gen8 with dual Xeon 2695v2 CPUs (24 cores / 48 threads)- I find it amusing to run top, and tell it to show me all CPUs and it says “Sorry, terminal is not big enough“), and only one is required for current production loads(even on a high traffic day).

These systems are dedicated to this application. You might think when launching these on day one and seeing the CPU usage of the application go from ~45% to under 5% would make me say, oh what a waste of hardware resources let’s pile more containers on this. No way. We saved an enormous amount of costs in licensing for this application by doing this, well enough to pay for the servers quite quickly. We also have capacity for a long time to come, and can handle any bursts in traffic without a worry.  It was a concept that turned into a great success story for containers at my organization.

I gave a benefit of eliminating dependency on SAN storage as a bonus, these are the first physical servers that this organization has deployed with internal storage. Everything else is boot from SAN(Like I am going to trust a $5 piece crap USB flash memory stick for a hypervisor when I have multipath fibre channel available likewise goes for having internal disks in the servers just for a tiny hypervisor). Obviously the big benefit of shared storage is being able to vmotion between hosts. Can’t do that with containers(as far as I am aware anyway), so we put 5 disks in each server 4 of them in RAID 10 with one hot spare and 1GB of battery backed write cache.

So while I love my SAN storage, in this case it wasn’t needed, so we aren’t using it. Saved some costs and complexity on fibre channel cards and connectivity etc(not really an iSCSI fan for production systems).

I did somewhat dread the driver situation going to physical hardware, my last experiences with physical hardware with Linux several years ago were kind of frustrating with the drivers, I remember many times having to build custom kickstart disks for NIC drivers or storage drivers etc.. Fortunately this time around the stock drivers worked fine.

We also saved costs on networking, all of our VMware hosts each have two dual port 10GbE cards, along with 2x1Gbps ports for management(total 11 cables coming out of each server). The container hosts since they really only have one container active at a time rely just on the 2x1Gbps ports, more than enough for a single container(total 5 cables coming out of each server).

No rapid build up or tear down

The original containers have been running continuously (short of a couple of reboots, and some OS patches) for well over a year at this point. They do not have a short life span.

Downsides to containers

No technology is perfect of course, and I did fairly quickly come across some very annoying limitations of container technology inside the Linux kernel, which prevents me from making containers a more general purpose replacement for VMs. Maybe now some of these issues are resolved, I am not sure, I don’t run bleeding edge kernels etc.

  • autofs does not function inside containers. We use autofs for lots of NFS mount points, and not having it operate is very annoying. It was a documented kernel limitation when we deployed containers last year, since we are on the same general kernel version today I don’t believe that has changed for us anyway.
  • Memory capacity is not correctly reported by the container. If the host has 64GB of memory, and the container is limited to 32GB of memory, all of the general linux tools inside the container all report 64GB of memory available, again, annoying, and I imagine this means the container doesn’t handle out of memory situations too gracefully as it has no idea it is about to run out before it hits the wall.
  • Likewise querying per-container CPU usage using standard linux tools is impossible. Everything reports the same CPU usage whether it is the host, the active container on the host, or the idle container on the host.
  • Running containers that span multiple subnets simultaneously is extremely difficult and complicated. I have probably a dozen different VLANs on VMware hosts each on different subnets, each with different default gateways etc. The routing exists in the Linux kernel and having more than one default gateway is a real pain. I read last year it seemed to be technically possible, but the solution was not at all a practical one. So in the meantime, a host has to be dedicated to a single subnet.
  • Process listings on the container host is quite confusing, as it lists the processes for all of the containers as well, identifying which process is from where is confusing and annoying. Having to have custom monitors configured to say, on these hosts having 6 postfix processes is ok but everywhere else 1 is required, is annoying too. I’m sure there is probably lxc-specific tools that can do it but the point is the standard linux tools don’t handle this well at all.
  • Lack of ability to do things like move containers between hosts, some applications, and some environments can be made fully redundant so you can lose a VM/container and be ok. But many others are not. I literally have several hundred VMs each of which are single points of failure because most are development VMs and it is a waste to build redundancy into every development environment the resource requirements would explode. So having things like vmotion & VMware high availability, and even DRS for host affinity rules is very nice to have.

Any one of the above I would consider a deal breaker for large(r) scale deployments of containers at organizations I have worked for. Combine them all? What a mess.

There are other limitations as well, those are just the most severe I see.

Future uses of containers at my organization

I can see future uses of containers at my organization expanding in the production environment, targeting CPU hungry applications and putting them on physical hardware. Maybe even feel brave enough to host multiple applications on the same hardware knowing that I have no good insight into how much each application is using CPU wise(since all current monitoring is performed at the OS level not the application level). Time will tell.

I said earlier we continue to activate “both farms” even though we use containers. In the case of the container hosted application we do not ever activate both farms anymore, but we do have other production web applications that are farm based and living in VMware still, so those we do activate both farms for in anticipation(hopefully) or response to sudden increases in traffic.

Containers inside a hypervisor are a waste of time

In case it isn’t obvious it is my belief that the main point of using containers is to leverage the underlying hardware of server platform you are on, and removing the overhead and costs associated with the hypervisor where possible. Running containers within a hypervisor to me is a misguided effort. Of course I am sure there are people doing this in public clouds because they want to use containers but they are limited by what the “cloud” will give them (hence the original pro-docker article talking about this specific point).

I do not believe that containers themselves have any bearing on deployment of applications in any scenario. They are completely independent things. A container, from a high level (think CxO level) is functionally equivalent to a virtual machine, a concept we have had in the server world for over a decade at this point.

Deep down technically they are pretty different but the concept of segmenting a physical piece of hardware into multiple containers/VMs so that things don’t run over each other is nothing new (and it’s really really old if you get outside of the x86 world I believe IBM has been doing this kind of thing for 30+ years on big iron).

Good use cases for containers at hyper scale

At hyperscale (never having worked at such a scale but I get the gist of how some things operate), all math changes. Every decision is magnified 10,000x.

  • Suddenly saving 5 watts of power on a server is a big deal because you have 150,000 servers deployed.
  • Likewise the few percent of CPU and memory overhead provided by hypervisors can literally cost an organization millions of $ at high scale.
  • Yet alone licensing costs from the likes of VMware etc even with volume/enterprise deals.
  • The time required to launch a VM really is slow compared to launching a container, which again at scale that time really adds up.

There was an article I read last year that said google launches 2,000,000,000 containers per week. Maybe I have launched 4,000 VMs in the past decade – average 7.7 VMs per week(that is aiming really high too). So perspective is in order here. (yes I wanted to write out the 2 billion number that way, nicer perspective). 2 billion per week vs 8 per week, yeah, just slightly different scale here.

At scale you can obviously overcome the limitation of requiring multiple subnets on a server because you have fleets of systems, each fleet probably on various subnets, you’re so big you don’t need to be that consolidated. You probably have a good handle on application-level CPU and memory monitoring(not relying on monitoring of the VM/container as a whole), you probably don’t rely too much on NFS, but instead applications probably use a lot of object storage. You probably never login to the servers so you don’t care what the process list looks like. Your application is probably so fault tolerant that you don’t care about losing a host.

All of these are perfectly valid scenarios to have at a really big scale. But again most organizations will never, ever get to that scale. I’ll say again I believe firmly that trying to build for that level of scale from the outset is a mistake because you will very likely do it wrong, even if you think you know what you are doing.

I’ll use another example here, again taking from one of my comments from el reg recently. I had a job interview back in 2011 at a mid sized company in Seattle, they probably had a few hundred servers, and a half dozen to dozen or so people in the operations group(s). They had recently hired some random guy(random to me anyway) out of Amazon who proclaimed he was a core part of building the Amazon cloud (yet his own linkedin profile said he was just some random engineer there). He talked the talk, I obviously didn’t know him so it was hard to judge his knowledge based on a 1-2 hour interview with him. Our approaches were polar opposite to each other. I understood his approach(the Amazon way), and I understood my approach(the opposite). Each has value in certain circumstances. It was the only interview I’ve ever had where I was really close to just standing up and walking out. My ears were hot, I could tell I would not get along with this person. I kept my BS going though because I was looking for a new job.

The next day or the day after they offered me the job(apparently this guy liked me a lot), I declined politely and accepted the position I am at now and relocated to the bay area a couple of months later.

I had friends who knew this company and kept me up to date on what was going on over there. This guy wanted to build an Amazon cloud at this company. An ambitious goal to be sure, I believed firmly they weren’t going to be able to do it, but this guy believed they could. So they went down the procurement route, and it was rough going. At one point their entire network team quit en-masse because they did not agree with what this guy was doing. He was basically trying to find the cheapest hardware money could buy and wanted to make it “cloud”. He was clueless but their management bought into his BS for some time. He wrecked the group, and within a year I want to say I was informed that not only was he fired but he was escorted out of the building. The company paid through the nose to hire a new team because word got around, nobody wanted to work there. Last I heard they were doing well, had long abandoned the work this person had tried to do.

He had an idea, he had some experience, he knew what he wanted to do. He didn’t realize the organization lacked the ability to execute on that vision. I realized this during my one day interview there but he had no idea, or didn’t care (maybe he thought if they just work hard enough they can make it work).

Anyway perhaps an extreme example, but one that remains fresh in my mind.


Simply trying to do something just because Amazon, or Google(hello hipster Hadoop users from the past decade) or even Microsoft is doing it doesn’t automatically make it a good idea for your organization, you’ve got to have the ability to execute on it, and in many cases execution turns out to be much harder than it appears(I once had one VP tell me he wanted to use HDFS for vmware storage, are you kidding me? At the same company the CTO wanted to entertain the idea of using FreeNAS for their high volume data processing TBs of data per day hundreds of megabytes of throughput per second for their mission critical data, the question was so absurd I didn’t know how to respond at the time).

I re-read what I wrote in the original container hype article many times(as I always re-read many times and make corrections). I realized pretty quickly that the person who wrote the original pro-docker container article I was quoting really seemed to me like a young developer who lacked experience working on anything other than really toy applications. One of the system administrators I know outright said at one point he just stopped reading that (pro-docker) article because the arguments were just absurd. But those points did seem to me to be along the lines of what I have been hearing for the past year so I believed it was a well formed post that I could leverage to respond to.

September 1, 2015

HP 3PAR Case study with my organization

Filed under: Storage — Tags: , — Nate @ 7:39 am
Stella & Dot HP 3PAR Case Study

Stella & Dot HP 3PAR Case Study (click to download)

Stella & Dot relies on HP 3PAR StoreServ Storage

“Highly reliable HP 4-node storage architecture supports over 30,000 e-commerce Independent Business Owners worldwide”

This is also available on HP’s case study website

I have had this blog since July 2009, and I don’t believe ever once have I mentioned any of my employers names. This will be an exception to that record.

HP came to us last year when we were in the market for their 3PAR 7450-all flash system. There was some people in management over there that really liked our company’s brand and I’m told practically everyone in 3PAR is aware of me. So they wanted to do a case study with the company I work for on our usage of 3PAR. I have participated in one, or maybe two 3PAR case studies in the past, prior to HP acquisition. The last one was in 2010 on a 3PAR T400. That particular company I was with had a policy where nobody with less than a VP title could be quoted. So my boss’s boss got all the credit even though of course I did all of the magic. Coincidentally I left that company just a few months later(for a completely different reason).

This company is different, I’ve had an extremely supportive management for my entire four years at this company and the newest management that joined in late 2013/early 2014 has been even more supportive. They really wanted me to get as much credit as I could get for all the hard work I do. So it’s my name all over the case study not theirs.  It’s people like this that more than anything keep me happy in my current role, I don’t see myself going anywhere anytime soon (added bonus is the company is stable and I believe will have no trouble surviving the next tech crash without an issue since we aren’t tech oriented).

Anyway, the experience with HP in making the case study was quite good. They are very patient, we said we needed time to work with the new system before we did the case study. They told us take all the time we want, no rush. About 8 months into using the new 7450 they reached out again and we agreed to start the process.

I spent a couple hours on the phone with them, and exchanged a few emails. They converted a lot of my technical jargon into marketing jargon (I don’t actually talk like that!), though the words seemed reasonable to me. The only thing that is kind of a mistake in the article is we don’t leverage any of the 3PAR “Application Suites”. I mentioned this to them saying they can remove those references if they wish, I didn’t care either way. At the end they also make reference to a support event I had with 3PAR five years ago which was at the previous company, and they credited HP for it when technically it was well before the acquisition(told them that as well, though seems reasonable to credit HP to some extent for that since I’d wager the same staff that performed those actions worked at HP for a while anyways, or maybe they are still there).

I would wager that my feedback into the benefits I see with 3PAR are probably not typical among HP customers. The HP Solutions Architect assigned to our account has told me on several occasions that he believes I operate my 3PAR systems better than most any other customer he’s seen, that made me feel good even though I already felt I operate my systems pretty well!

On that note, our SaaS monitoring service Logic Monitor is working with me to try to formalize my custom 3PAR monitoring I wrote (which gathers about 12,000 data points a minute from our three arrays) into something more customers can leverage, and if they can get that done then I hope I can get HP to endorse their service for monitoring 3PAR because it really works well in general, and better than any other tool I’ve seen or used for my 3PAR monitoring needs at least.

3PAR 8000 & 20450

I’m pretty excited about the new 8000-series and the new 20450 (4-node 20k series) that came out a few days ago. I would say really excited but given 3PAR’s common architecture, the newer form factors were already expected by me. I am quite happy that HP released an 8440 with the exact same specs as the 8450 (meaning 3X more data cache than the 8400 and more, faster CPU cores), also the 4-node 20450 has the same cache and CPU allocations per-node that the all-flash 8-node 20850 has. This means you can get these systems and not be “forced” into an all flash configuration, since the 8450 and the 20850 systems are “marketing limited” to be all flash(to make people like Gartner happy).

August 27, 2015

Container Hype

Filed under: Random Thought — Tags: — Nate @ 5:03 am

(You can see part two of my thoughts on containers here.)

I’ll probably regret this post for a little while at least. Because I happened to wake up at about 3AM this morning and found myself not really falling back asleep quickly and I was thinking about this article I read last night on Docker containers and a couple skype chats I had with people regarding it.

Like many folks, I’ve noticed a drastic increase in the hype around containers, specifically Docker stuff over the past year or so. Containers are nothing new, there have been implementations of them for a decade on Solaris anyway. I think Google has been using them for about a decade, and a lot of the initial container infrastructure in Linux (cgroups etc) came from Google. Red Hat has been pushing containers through their PaaS system Openshift I want to say for at least five years now since I first started hearing about it.

My personal experience with containers is limited – I have deployed a half dozen containers in production for a very specific purpose a little over a year ago using LXC on Ubuntu. No Docker here, I briefly looked into it at the time and saw it provided no value to what I do so I went with plain LXC. The containers have worked well since deployment and have completely served their purpose. There are some serious limitations to how containers work(in the Linux kernel) which today prevent them from being used in a more general sense, but I’ll get to that in another post perhaps (this one ended up being longer than I thought). Perhaps since last year’s deployment some of those issues have been addressed I am not sure. I don’t run bleeding edge stuff.

I’ll give a bit of my own personal experience here first so you get an idea where I am coming from. I have been working in technical operations for organizations (five of them at this point) running more or less SaaS services (though internally I’ve never heard that term tossed about at any company, we do run a hosted application for our customers) for about 12 years now. I manage servers, storage, networking, security to some degree, virtualization, monitoring etc(the skills and responsibilities have grown over time of course). I know I am really good at what I do(tried to become less modest over recent couple of years). The results speak for themselves though.

My own personal experience again here – I can literally count on one hand the number of developers I have worked with over the years that stand out as what I might call “operationally excellent”. Fully aware of how their code or application will work in production and builds things with that in mind, or knows how to engage with operations in really productive way to get questions answered on to how best to design or build things. I have worked with dozens of developers(probably over 100 at this point), some of them try to do this, others don’t even bother for some reason or another.  The ones I can count on one hand though, truly outstanding, a rare breed.

Onto the article. It was a good read, my only real question is does this represent what a typical Docker proponent thinks of when they think of how great Docker is or how it’s the future etc. Or is there a better argument. Hopefully this represents what a typical Docker person thinks so I’m not wasting my time here.

So, to address point by point, try to keep it simple

Up until now we’ve been deploying machines (the ops part of DevOps) separately from applications (the dev part). And we’ve even had two different teams administering these parts of the application stack. Which is ludicrous because the application relies on the machine and the OS as well as the code, and thinking of them separately makes no sense. Containers unify the OS and the app within the developer’s toolkit.

This goes back to experience. It is quite ludicrous to expect the developers to understand how to best operate infrastructure components, even components to operate their own app (such as MySQL) in a production environment. I’m sure there are ones out there that can effectively do it(I don’t claim to be a MySQL expert even myself having worked with it for 15 years now I’ll happily hand that responsibility to a DBA as do most developers I have worked with), but I would wager that number is less than 1 in 50.

Operating things in a development environment is one thing, go at it, have a VM or a container or whatever that has all of your services. Operating correctly in production is a totally different animal. In a good production environment (hopefully in at least one test environment as well) you have significantly more resources to throw at your application to get more out of it. Things that are just cost prohibitive or even impossible to deploy at a tiny scale in a development environment(when I say development environment I imply that it runs on a developer laptop or something). Even things like connectivity to external dependencies likely don’t exist in a development environment. For all but the most basic of applications production will always be significantly different in many ways. That’s just how it is. You can build production so it’s really close to other environments or even exactly the same but then you are compromising on so much functionality, performance, scalability that you’ve really just shot yourself in the foot and you should hope you don’t get to any kind of thing resembling scale (not “web scale” mind you) because it’s just going to fall apart.

Up until now, we’ve been running our service-oriented architectures on AWS and Heroku and other IaaSes and PaaSes that lack any real tools for managing service-oriented architectures. Kubernetes and Swarm manage and orchestrate these services

First off I’m happy to admit I’ve never heard of Kubernets and Swarm, I have heard of Heroku but no idea what it does. I have used AWS in the past (for about 2 years – worst experience of my professional career, I admit I do have PTSD when it comes to Amazon cloud).

I have worked with service-oriented architectures for the past 12 years. My very first introduction to SaaS was an EXTREMELY complicated Java platform that ran primarily on Weblogic+Oracle DB on the back end, with Apache+Tomcat on the front end. Filled with Enterprise Java Beans(EJB), and just dozens of services. Their policy was very tight, NOTHING updates the DB directly without going through a service. No “manual” fixes or anything via SQL(only company I’ve worked at with that kind of policy). Must write an app or something to interface with a service to fix issues. They stuck to it from what I recall while I was there anyway, I admire them for that.

At one point I took my knowledge of the app stack, and proposed an very new architecture for operational deployment, it was much more expensive, because this was before wide spread use of VM technology or containers in general. We split the tomcat tiers up for our larger customers into isolated pools(well over 200 new physical servers! that ran at under 5% cpu in general!). The code on all systems was the same but we used the load balancer to route traffic for various paths to different sets of servers. To some extent this was for scaling but the bigger problem this “solved” was something more simple operationally (but was not addressed to-date in the app) – logging. This app generated gobs of logging from tons of different subsystems (all of it going to centralized structure on each system) making it very difficult to see what log events belonged to what subsystem.  It could take hours to trace transactions through the system. Something as simple as better logging, which developers ignored forever, we were able to address by splitting things out. The project started out small scale but ballooned quickly as other people piled in. Approvals came fast and hard for everything. My manager said  “aim for the sky because they will reject some things”. I aimed for the sky and got everything I asked for(several million $ worth). I believe eventually they moved to a VM model a few years after I left. We tried to get the developers to fix the code, it never happened, so we did what we had to do to make things more manageable. I recall most everyone’s gleeful reaction the first time they started using the new systems. I know it’s hard to believe, you had to be there to see what a mess it was.

Though the app itself was pretty terrible. I remember two members of my team quit within a month and both said something along the lines of “we’ve been at the company 6-9 months and still don’t understand how the application works” (and their job was in part supporting production issues like mine was, I was as close to an expert in the operation of that application as one could get, it wasn’t easy). The data flows of this application were a nightmare, it was my first experience working in SaaS, so as far as I knew it was “normal”. But if I were exposed to that today I would run away screaming. So. many. outages. (bad code, and incredibly over designed) I remember one developer saying “why weren’t you in our planning meeting last year when we were building this stuff?” I said back something like “I was too busy pulling 90 hour weeks just keeping the application running, I had no time for anything else”. I freely admit these days I burned out hard core at that company, took me more than three years to recover. I don’t regret it, it was a good experience, I learned a lot, I had some fun. But it cost me a lot as well. I would not do it again at this point in my career, but if I had the ability to talk to my 2003 self I would tell me to do it.

My first exposure to “micro services” was roughly 2007 at another SaaS company, these services specifically were built with Ruby on Rails of all things. There were a few different ways to approach deploying it. By this time I had started using VMware ESX (my first production deployment of VMware was GSX 3.0 in 2004 in a limited scope production deployment at the previous company I referred to).

Perhaps the most common way would of just been to have an apache instance and the various applications inside of it, keep it simple. Another approach might of been to leverage VMware in a larger scope and build VMs for each micro service (each one had a different code base in subversion, not like it was a bunch of services in a single code base). I took a different approach though, an approach I thought was better(at the time anyway, I still think it was a good choice). I decided to deploy each service on it’s own apache instance(each listening on a different port) on a single OS image (CentOS or Fedora at the time) running on physical hardware. We had a few “web” servers each running probably 15 apache instances with custom configurations managed by CFengine. The “micro services” talked to each other through a F5 BigIP load balancer. We had other services on these boxes as well, the company had a mod_perl based application stack, and another Tomcat-based application, these all ran on the same set of servers.

A common theme for this for me is, twelve years of working with services oriented architectures, and eight years of working with “micro services” and I’ve never needed special sauce to manage them.

Up until now, we have used entire operating systems to deploy our applications, with all of the security footprint that they entail, rather than the absolute minimal thing which we could deploy. Containers allow you to expose a very minimal application, with only the ports you need, which can even be as small as a single static binary.

This point seems kind of pointless to me. Operating systems are there to provide the foundation of the application. I like the approach of trying to keep things common. That is the same operating system across as many of the components as possible – keeping in mind there are far more systems involved than just the servers that “run the application”. While saying minimal exposure is a nice thing to have, at the end of the day it really doesn’t matter(it doesn’t noticeably or in most cases measurably impact operation of the application, but it does improve manageability).

Up until now, we have been fiddling with machines after they went live, either using “configuration management” tools or by redeploying an application to the same machine multiple times. Since containers are scaled up and down by orchestration frameworks, only immutable images are started, and running machines are never reused, removing potential points of failure.

I’ll preface this by saying I have never worked for an organization that regularly or even semi regularly scaled up and scaled down their infrastructure(even while working in Amazon cloud). Not only have they never really done it, but they’ve never really needed to. I’m sure there are some workloads that can benefit from this, but I’m also sure the number is very small. For most you define a decent amount of headroom for your application to burst into and you let it go, and increase it as required(if required) as time goes on with good monitoring tools.

I’ll also say that since I led the technical effort behind moving my current organization out of Amazon cloud in late 2011(that is what I was hired to do, I was not going to work for another company that used them on a regular basis. Per earlier point we actually intended to auto scale up and down but at the end of the day it didn’t happen), we have not had to rebuild a VM, ever.  Well over three years now with never having had to rebuild a VM (well there is one exception where we retired one of our many test environments at one point only to have a need for it again a few months later, NOTHING like our experience with public cloud though). So yeah, the lifetimes of our systems are measured in years, not in hours, days, or weeks. Reliability is about as good as you can get in my opinion(again record speaks for itself). We’ve had just two sudden hardware failures causing VMs to fail in the past 3 and a half years. In both cases VMware High availability automatically restarted the affected VMs on other hosts within a minute, and HP’s automatic server recovery rebooted the hosts in question (in both cases had to get system boards replaced).

Some people when thinking of public cloud say “oh but how can we operate this better than Amazon, or Microsoft etc”. I’m happy to admit now that I KNOW I can operate things better than Amazon, Microsoft, Google etc. I’ve demonstrated it for the past decade, and will continue to do so. Maybe I am unique, I am not sure (I don’t go out of my way to socialize with other people like me). There is a big caveat to that statement that again I’m happy to admit to. The “model’ of many of the public cloud players is radically different from my model. The assumptions they make are not assumptions I make (and vise versa). Their model in order to operate well you really have to design your app(s) to handle it right. My model you don’t. I freely admit my model would not be good for “web scale”, just like their model is not good for the scale any company I have worked at for the past 12 years. Different approaches to solve similar issues.

Up until now, we have been using languages and frameworks that are largely designed for single applications on a single machine. The equivalent of Rails’ routes for service-oriented architectures hasn’t really existed before. Now Kubernetes and Compose allow you to specify topologies that cross services.

I’ll mostly have to skip this one as it seems very code related, I don’t see how languages and frameworks have any bearing on underlying infrastructure.

Up until now, we’ve been deploying heavy-weight virtualized servers in sizes that AWS provides. We couldn’t say “I want 0.1 of a CPU and 200MB of RAM”. We’ve been wasting both virtualization overhead as well as using more resources than our applications need. Containers can be deployed with much smaller requirements, and do a better job of sharing.

I haven’t used AWS in years, in part because I believe they have a broken model, something I have addressed many times in the past. I got upset with HP when they launched their public cloud and launched with a similar model. I believe I understand why they do things this way (because doing it “right” at “large scale” is really complicated).

So this point is kind of moot. I mean people have been able to share CPU resources across VMs for well over a decade at this point(something that isn’t possible in all major public cloud providers). I also share memory to an extent(this is handled transparently with the hypervisor). There is certainly overhead associated with VM, and with a “full” operating system image, but that is really the price you pay for the flexibility, and manageability that those systems offer. It’s a price I’m willing to pay in a heartbeat, because I know how to run systems well.

Up until now, we’ve been deploying applications and services using multi-user operating systems. Unix was built to have dozens of users running on it simultaneously, sharing binaries and databases and filesystems and services. This is a complete mismatch for what we do when we build web services. Again, containers can hold just simple binaries instead of entire OSes, which results in a lot less to think about in your application or service.

(side note, check the container host operating system, yeah that one that is running all of the native processes in the same kernel – yes a multi user operating system running dozens of services on it simultaneously, a container is just a veil..)

This is another somewhat moot point. Having a lot less to think about, certainly from an OS perspective to me makes things more complicated. If your systems are so customized that each one is different that makes life more difficult. For me I can count on a common set of services and functionality being available on EVERY system. And yes, I even run local postfix services on EVERY system(oh, sorry that is some overhead). To be clear postfix is there as a relay for mail which is then forwarded to a farm of load balanced utility services which then forward onto external SMTP services. This is so I can do something as simple as “cat some file| mail” and have it work.

Now we do limit what services run, e.g. Chef(current case) or CFengine(prior companies) only runs our core services, extra things that I never use are turned off. Some services are rare or special. Like FTP for example. I do have a couple of uses cases for running a FTP server still, and in those cases FTP services only run on the systems that need it. And obviously from an application standpoint not all apps run on all app servers.  But this kind of stuff is pretty obvious.

At the end of the day having these extra services provides convenience not only to us, but to the developers as well. Take postfix as an example. Developers came to me one day saying they were changing how they send email, instead of interfacing with some external provider via web API their new provider they will interface with SMTP. So where do they direct mail to? My answer was simple – in all cases, in all environments send mail to localhost, we’ll handle it from there. Sure you can put custom logic in your application or custom configurations for each environment if you want to send directly to our utility servers, but we sure as hell don’t want you to try to send mail directly from the web servers to the external parties, that’s just bad practice (for anything resembling a serious application assuming many servers supporting it and not just a single system running all services). The developers can easily track progress of the mail as it arrives on the locahost MTA, and is then immediately routed to the appropriate utility server farm (different farms for different environments due to network partitioning to prevent QA systems for example from talking to production, also each network zone(3 major zones) has a unique outbound NAT address, which is handy in the event of needing IP filters or something).

So again, worrying about these extra services is worrying about nothing in my opinion. My own experience says that the bulk of the problems with applications are code based, sometimes design, sometimes language, sometimes just bugs. Don’t be delusional and think that by deploying containers that will somehow magically make the code better and the application scale and be highly available. It’s addressing the wrong problem, it’s a distraction.

Don’t get me wrong though, I do believe containers do  have valid use cases, which I may cover in another post this one is already pretty long. I do use some containers myself (not Docker). I do see value in providing an “integrated” experience (download one file and get everything – even though that has been a feature in virtualized environments for probably close to a decade with OVF as well). That is not a value to me, because as an experienced professional it is rare that something works properly “out of the box”, at least as far as applications go. Just look for example at how Linux distributions package applications, many have their own approach on where to put files, how to manage things etc. That’s the simplest example I can give. But I totally get it for constrained environments it’s nice to be able to get up to speed quickly with a container. There are trade offs certainly once you get to “walking” (thinking baby step type stuff here).

There is a lot of value that operating systems and hypervisors and core services provide. There is overhead associated with them certainly. In true hyper scale setups this overhead is probably not warranted (Google type scale). I will be happy to argue till I am somewhat blue in the face that 99.99% of organizations will never be at that scale, and trying to plan for that from the outset is a mistake(and I’d wager almost all that try, fail because they over design or under design), because there is not one size that fits all. You build to some reasonable level of scale, then like anything new requirements likely come in, and you re evaluate, re-write, re-factor or whatever.

It’s 5AM now, I need to hit the gym.

June 1, 2015

3PAR Gen5: True flash scale storage

Filed under: Storage — Tags: , — Nate @ 6:14 am

(I’m in Vegas, waiting for HP Discover to start(got here Friday afternoon), this time around I had my own company pay my way, so HP isn’t providing me with the trip. Since I haven’t been blogging much the past year I didn’t feel good about asking HP to cover me as a blogger.)

UPDATED – 6/4/2015

When flash first started becoming a force to be reckoned with a few years ago in enterprise storage it was clear to me controller performance wasn’t up to the task of being able to exploit the performance potential of the medium. The ratio of controllers to SSDs was just way out of whack.

In my opinion this has been largely addressed in the 3PAR Generation 5 systems that are being released today.

3PAR 20k system in the flesh

3PAR 20k system in the flesh

NEW HP 3PAR 20800

I believe this replaces the 10800/V800 (2-8 controllers) system, I believe it also replaces the 10400/V400(2-4 controllers) system as well.

  • 2-8 controllers, max 96 x 2.5Ghz CPU cores(12/controller) and 16 Generation 5 ASICs (2/controller)
  • 224GB of RAM cache per controller (1.8TB total – 768GB control / 1,024GB data)
  • Up to 32TB of Flash read cache
  • Up to 6PB of raw capacity (SSD+HDD)
  • Up to 15PB of usable capacity w/deduplication
  • 12Gb SAS back end, 16Gb FC (Max 160 ports) / 10Gb iSCSI (Max 80 ports) front end
  • 10 Gigabit ethernet replication port per controller
  • 2.5 Million Read IOPS under 1 millisecond
  • Up to 75 Gigabytes/second throughput (read), up to 30 Gigabytes/second throughput (write)

NEW HP 3PAR 20850

This augments the existing 7450 with a high end 8-controller capable all flash offering, similar to the 20800 but with more adrenaline.

  • 2-8 controllers, max 128 x 2.5Ghz CPU cores(16/controller), and 16 Generation 5 ASICs (2/controller)
  • 448GB of RAM cache per controller (3.6TB total – 1,536GB control – 2,048GB data)
  • Up to 4PB of raw capacity (SSD only)
  • Up to 10PB of usable capacity w/deduplication
  • 12Gb SAS back end, 16Gb FC (Max 160 ports) / 10Gb iSCSI (Max 80 ports) front end
  • 10 Gigabit ethernet replication port per controller
  • 3.2 Million Read IOPS under 1 millisecond
  • 75 Gigabytes/second throughput (read), 30 Gigabytes/second throughput (write)

Even though flash is really, really fast, 3PAR leverages large amounts of cache to optimize writes to the back end media, this not only improves performance but extends SSD life. If I recall correctly nearly 100% of the cache on the all SSD-systems is write cache, reads are really cheap so very little caching is done on reads.

It’s only going to get faster

One of the bits of info I learned is that the claimed throughput by HP is a software limitation at this point. The hardware is capable of much more and performance will improve as the software matures to leverage the new capabilities of the Generation 5 ASIC (code named “Harrier 2”).

Magazine sleds are no more

Since the launch of the early 3PAR high end units more than a decade ago 3PAR had leveraged custom drive magazines which allowed them to scale to 40×3.5″ drives in a 4U enclosure. That is quite dense, though they did not have similar ultra density for 2.5″ drives. I was told that nearline drives represent around 10% of disks sold on 3PAR so they decided to do away with these high density enclosures and go with 2U enclosures without drive magazines. This allows them to scale to 48×2.5″ drives in 4U (vs 40 2.5″ in 4U before), but only 24×3.5″ drives in 4U (vs 40 before). Since probably greater than 80% of the drives they ship now are 2.5″ that is probably not a big deal. But as a 3PAR historian(perhaps) I found it interesting.

Along similar lines, the new 20k series systems are fully compatible with 3rd party racks. The previous 10k series as far as I know the 4-node variant was 3rd party compatible but 8-node required a custom HP rack.

Inner guts of a 3PAR 20k-series controller

With dual internal SATA SSDs for the operating system(I assume some sort of mirroring going on), eight memory slots for data cache(max 32GB per slot) – these are directly connected to the pair of ASICs(under the black heatsinks). Another six memory slots for the control cache(operating system, meta data etc) also max 32GB per slot those are controlled by the Intel processors under the giant heat sinks.

3PAR 20k series controller

3PAR 20k series controller

How much does it cost?

I’m told the new 20k series is surprisingly cost effective in the market. The price point is significantly lower than earlier generation high end 3PAR systems. It’s priced higher than the all flash 7450 for example but not significantly more, entry level pricing is said to be in the $100,000 range, and I heard a number tossed around that was lower than that. I would assume that the blanket thin provisioning licensing model of the 7000 series extends to the new 20k series, but I am not certain.

It is only available in a 8-controller capable system at this time, so requires at least 16U of space for the controllers since they are connected over the same sort of backplane as earlier models. Maybe in the future HP will release a smaller 2-4 controller capable system or they may leave that to whatever replaces the 7450. I hope they come out with a smaller model because the port scalability of the 7000-series(and F and E classes before them) is my #1 complaint on that platform, having only one PCIe expansion slot/controller is not sufficient.


HP says that this is the new Sandisk Optimus MAX 4TB drive, which is targeted at read intensive applications. If a customer decides to use this drive in non read intensive(and non 3PAR) then the capacity will drop to 3.2TB. So with 3PAR’s adaptive sparing they are able to gain 600GB of capacity on the drive while simultaneously supporting any workload without sacrificing anything.

This is double the size of the 1.92TB drive that was released last year. It will be available on at least the 20k and the 7450, most likely all other Gen4 3PAR platforms as well.

HP says this drops the effective cost of flash to $1.50/GB usable.

This new SSD comes with the same 5 year unconditional warranty that other 3PAR SSDs already enjoy.

I specifically mention the 7450 here because this new SSD effectively doubles the raw capacity of the system to 920TB of raw flash today (vs 460TB before). How many all flash systems scale to nearly a petabyte of raw flash?

NEW HP 3PAR Persistent Checksum

With the Generation 4 systems 3PAR had end to end T10 data integrity checking within the array itself from the HBAs to the ASICs, to the back end ports and disks/SSDs. Today they are extending that to the host HBAs, and fibre channel switches as well (not sure if this extends to iSCSI connections or not).

The Generation 5 ASIC has a new line rate SHA1 engine which replaces the line rate CRC engine in Generation 4 for even better data protection. I am not certain if persistent checksum is Generation 5 specific(given they are extending it beyond the array I really would expect it to be possible in Generation 4 as well).

NEW HP 3PAR Asynchronous Streaming Replication

I first heard about this almost two years ago at HP Storage Tech Day, but today it’s finally here. HP adds another method of replication to the existing sets they already had:

  • Synchronous replication – 0 data loss (strict latency limits)
  • Synchronous long distance replication (requires 3 arrays) – 0 data loss (latency limits between two of the three arrays)
  • Asynchronous replication – as low as 5 minutes of data loss (less strict latency limits)
  • Asynchronous streaming replication – as low as 1 second of data loss (less strict latency limits)

HP compares this to EMC’s SRDF async replication which has as low as a 15 seconds of data loss, vs 3PAR with as low as 1 second.

If for some reason more data comes into the system than the replication link can handle, the 3PAR will automatically go into asynchronous replication mode until the replication is caught up then switch back to asynchronous streaming.

This new feature is available on all Gen4 and Gen5 systems.

NEW Co-ordinated Snapshots

3PAR has long had the ability to snapshot multiple volumes on a single system simultaneously, and it’s always been really easy to use. Now they have extended this to be able to snapshot across multiple arrays simultaneously and make them application aware (in the case of VMware initially, Exchange, SQL Server and Oracle to follow).

This new feature is available on all Gen4 and Gen5 systems.

HP 3PAR Storage Federation

Up to 60PB of usable capacity and 10 Million IOPS with zero overhead

3PAR Federation

3PAR Federation

HP has talked about Storage Federation in the past, today with the new systems of course the capacity knobs have been turned up a lot, they’ve made it easier to use than earlier versions of the software, though don’t yet have completely automatic load balancing between arrays yet.

This federation is possible between all Gen4 and Gen5 systems.

Benefits from ASIC Acceleration

3PAR has always use in house custom ASICs on their systems and these are no different

The ASICs within each HP 3PAR StoreServ 20850 and 20800 Storage controller node serve as the high-performance engines that move data between three I/O buses, a four memory-bank data cache, and seven high-speed links to the other controller nodes over the full-mesh backplane. These ASICs perform RAID parity calculations on the data cache and inline zero-detection to support the system’s data compaction technologies. CRC Logical Block Guard used by T10-DIF is automatically calculated by the HBAs to validate data stored on drives with no additional CPU overhead. An HP 3PAR StoreServ 20800 Storage system with eight controller nodes has 16 ASICs totaling 224 GB/s of peak interconnect bandwidth.

3PAR 208x0 Controller Architecture

3PAR 208×0 Controller Architecture


NEW Online data import from HDS arrays

You are now able to do online import of data volumes from Hitachi arrays in addition to the EMC VMAX, CX4, VNX, and HP EVA systems.

The competition

HP touts the scalability of usable and raw flash capacity of these new systems + the new 3.84TB SSD against their competition:

  • Consolidate thirty Pure Storage //m70 storage systems onto a single 3PAR 20850 (with 87% less power/cooling/space) ***
  • Consolidate eight XtremeIO storage systems onto a single 3PAR 20850 (with 62% less power/cooling/space)
  • Consolidate three EMC VMAX 400K storage systems onto a single 3PAR 20850 (with 85% less power/cooling/space)

HP also touts their throughput numbers (75GB/second) are between two and ten times faster than the competition. The 7450 came in at only 5.5GB/second, so this is quite a step up.

*** HP revised their presentation last minute their original claims were against the Pure 450, which was replaced by the m70 on the same day of the 3PAR announcement. The numbers here are from memory from a couple of days ago they may not be completely accurate.

Fastest growing in the market

HP touted again 3PAR was the fastest growing all flash in the market last year. They also said they have sold more than 1,000 all flash systems in the first half which is more than Pure Storage sold in all of last year. In other talks with 3PAR folks specifically on market share they say they are #1 in midrange in Europe and #2 in Americas, with solid growth across the board consistently for many quarters now. 3PAR is still #5 in the all flash market, part of that is likely due to compression(see below), but I have no doubt this new generation of systems will have a big impact on the market.

Still to come

Compression remains a road map item, they are working on it, but obviously not ready for release today. Also this marks probably the first 3PAR hardware released in more than a decade that wasn’t accompanied by SPC-1 results. HP says SPC-1 is coming, and it’s likely they will do their first SPC-2 (throughput) test on the new systems as well.

Bottom line

HP continues to show that it’s 3PAR architecture is fully capable of embracing the all flash era and has a long life left in it. Not only are you getting the maturity of the enterprise proven 3PAR systems (over a decade at this point), but you are not having to compromise on almost anything else related to all flash(compression being the last holdout).

March 4, 2015

Sign off ?

Filed under: Random Thought — Nate @ 9:59 am

So I apologize (again) for not posting much, and not replying to comments recently.

I suppose it’s obvious I haven’t posted in a long time. I have mentioned this many times before but there really isn’t much in tech that has gotten me excited in probably the past two years. I see new things and am just not interested anymore for whatever reason.

I have been spending some time with the 3PAR 7450 that I got late last year that is a pretty cool box but at the end of the day it’s the same 3PAR I’ve known for the past 8 years just with SSDs and dedupe (which is what I wanted, I needed something I felt I could rely on for the business I work for, I have become very conservative when it comes to storage over the years).

That and there’s been a lot of cool stuff going on with me outside of tech so I am mostly excited about that and have been even less focused on tech recently.

I pushed myself harder than I thought possible for more than a decade striving to be the best that I could be in the industry and think I accomplished a lot (at one point last year a former boss of mine said they hired 9 people to do my job after I left that particular company. Other positions are/were similar, perhaps not as extreme.)

I am now pushing myself harder than I ever thought possible in basically everything BUT tech, in part to attempt to make up for sacrifices made over the previous decade. So I am learning new things, just not as much in technology and I don’t know how long this journey will take.

I can’t put into words how excited I am.

Tech interesting areas I have spent some time on in recent months that may get a blog post at some point include:

  • LogicMonitor – the most advanced/easy to use dashboarding/graphing system I’ve ever come across. It does more than dashboards and graphs but to-date that is all I’ve used it for and it pays for itself 5x over with that alone for me. I’ve spent a bunch of time porting my custom monitors over to it including collecting more than 12,000 data points/minute from my 3PAR systems! I can’t say enough good things about this platform from the dashboard/graphing standpoint(since that is all I use it for right now)
  • ScaleArc – Sophisticated database availability tool. For me using it for MySQL though they support other DBs as well. Still in the very early stages of deployment.
  • HP StoreOnce – not sure I have much to write about this, since I only use it as a NAS, all of the logic is in my own scripts. But getting 33.6:1 reduction in data on 44TB of written user data is pretty sweet for me, beats the HELL out of the ZFS system I was using for this before(maybe 5:1 reduction with ZFS).

So, this may be the last blog post for a while(or forever) I am not sure, for anyone out there still watching, thanks for reading over the years, thanks for the comments, and wish you the best!


November 7, 2014

Two factor made easy

Filed under: Random Thought,Security — Nate @ 12:04 am

Sorry been really hammered recently, just spent the last two weeks in Atlanta doing a bunch of data center work(and the previous week or two planning for that trip), many nights didn’t get back to the hotel until after 7AM .. But got most of it done..still have a lot more to do though from remote.

I know there has been some neat 3PAR announcements recently I plan to try to cover that soon.

In the meantime onto a new thing to me: two factor authentication. I recently went through preparations for PCI compliance and among those things we needed two factor authentication on remote access. I had never set up nor used two factor before. I am aware of the common approach of using a keyfob or mobile app or something to generate random codes etc. Seemed kind of, I don’t know, not user friendly.

In advance of this I was reading a random thread on slashdot something related to two factor, and someone pointed out the company Duo Security as one option. The PCI consultants I was working with had not used it and had proposed another (self hosted) option which involved integrating our OpenLDAP with it, along with radius and mysql and a mobile app or keyfob with codes and well it just all seemed really complicated(compounded by the fact that we needed to get something deployed in about a week). I especially did not like the having to type in a code bit. I mean it wasn’t too much before that I got a support request from a non technical user trying to login to our VPN – she would login and the website would prompt her to download & install the software. She would download the software (but not install it) and think it wasn’t working – then try again (download and not install). I wanted something simpler.

So enter Duo Security, a SaaS platform for two factor authentication that integrates with quite a bit of back end things including lots of SSL and IPSec VPNs (and pretty much anything that speaks Radius which seems to be standard with two factor).

They tie it all up into a mobile app that runs on several different major mobile platforms both phone and tablet. The kicker for them is there are no codes. I haven’t seen any other two factor systems personally that are like this (have only observed maybe a dozen or so, by no means am I an expert at this). The ease of use comes in two forms:

Inline self enrollment for our SSL VPN

Initial setup is very simple, once the user types their username and password to login to the SSL VPN (which is browser based of course), an iframe kicks in (how this magic works I do not know) and they are taken through a wizard that starts off looking something like this


No separate app, no out of band registration process.

By comparison (what prompted me to write this now) is I just went through a two factor registration process for another company (which requires it now) who uses something called Symantec Validation & ID Protection which is also a mobile app. Someone had to call me on the phone, I told them my Credential ID, and a security code, then I had to wait for the 2nd security code and told them that, and that registered my device with whatever they use.  Compared to Duo this is a positively archaic solution.

Yet another service provider I interact with regularly recently launched (and is pestering me to sign up for) two factor authentication – they too use these old fashioned codes. I’ve been hit with more two factor related things in the past month than in the past probably 5 years or something.


Sync your phone with Duo security by scanning a QR code with your phone (obscured the QR code a bit just in case that has sensitive info in it)

By contrast the self enrollment in Duo is simple, requires no interaction on my part, users can enroll whenever they want. They can even register multiple devices on their own, and add/delete devices if they wish.

One of the times during testing I did have an issue scanning the QR code, which normally takes about 2 seconds on my phone. I was struggling with it for a minute or two, until I realized my mouse cursor was on top of it, which was blocking the scan from working. Maybe they could improve it by somehow cloaking the mouse cursor with javascript or something if it goes over the code, I don’t know.

Don’t have a mobile app? Duo can use those same old fashioned key codes too(by their or 3rd party keyfob or mobile app), or they can send you a SMS message, or make a voice call to you (the prompt basically says hit any button on the touch tone phone to acknowledge the 2nd factor — of course that phone# has to be registered with them).

Simply press a button to acknowledge 2nd factor

The other easy part is there is of course no codes to have to transcribe from a device to the computer. If you are using the mobile app, upon login you get a push notification from the app (in my experience more often than not this comes in less than 2 seconds after I try to login). The app doesn’t have to be running (it runs in the background even if you reboot your phone). I get a notification in Android (in my case) that looks like this:

duo-android-sanDuo integrated nicely into Android

I obscured the IP address and the company name just to try to keep this not associated with the company I work for. If you have the app running in the foreground you can see a full screen login request similar to the smaller one above. If for some reason you are not getting the push notification you can use tell the app to poll the Duo service for any pending notifications(only had to do that once so far).

The mobile app also has one of those number generator things so you can use that in the event you don’t have a data connection on the phone. In the event the Duo service is off line you have the option of disabling 2nd factor automatically(default) so them being down doesn’t stop you from getting access, or if you prefer ultra security you can tell the system to prevent any users from logging in if the 2nd factor is not available.

Normally I am not one for SaaS type stuff – really the only exception is if the SaaS provides something that I can’t provide myself. In this case the simple two factor stuff, the self enrollment, the ability to support SMS and phone voice calls(of which about a half dozen of my users have opted to use) is not anything I could of setup in a short time frame anyway (our PCI consultants were not aware of any comparable solution – and they had not worked with Duo before).

Duo claims to be able to setup in just a few minutes – for me the reality was a little different, the instructions they had were only half what I needed for our main SSL VPN, I had to resort to instructions from our VPN appliance maker to make up the difference (and even then I was really confused, until support explained it to me. Their instructions were specifically for two factor on IOS devices though applied to my scenario as well). For us the requirement is that the VPN device talk to BOTH LDAP and Radius. LDAP stores the groups that users belong to, and those groups determine what level of network access they get. Radius is the 2nd factor(or in the case of our IPSec VPN the first factor too more on that in a moment). In the end it took me probably 2-3 hours to figure it out, about half of that was wondering why I couldn’t login(because I hadn’t setup the VPN->LDAP link so the authentication wasn’t getting my group info so I was not getting any network permissions).

So for our main SSL VPN, I had to configure a primary and a secondary authentication, and initially with Duo I just kept it in pass through mode (only talking to them and not any other authentication source) because the SSL VPN was doing the password auth via LDAP.

When I went to hook up our IPSec VPN that was a different configuration, that did not support dual auth of both LDAP and Radius, it could do LDAP group lookups and password auth with radius though.  So I put the Duo proxy in a more normal configuration which meant I needed another Radius server that was integrated with our LDAP(which runs on the same VM as the Duo proxy on a different port) that the Duo proxy could talk to(talks to localhost) in order to authenticate passwords. So the IPSec VPN would send a radius request to the Duo proxy which would then send that information to another Radius (integrated with LDAP) and to their SaaS platform, and give a single response back to allow or deny the user.

At the end of the day the SSL VPN ends up authenticating the user’s password twice (once via LDAP once via RADIUS), but other than being redundant there is no harm.

Here is what the basic architecture looks like, this graphic is more ugly than my official one since I wanted to hide some of the details, you can get the gist of it though


Two factor authentication for SSL, IPSec and SSH with redundancy

The SSL VPN supported redundant authentication schemes, so if one Duo proxy was down it would fail back to another one, the problem was the timeout was too long, it would take upwards of 3 minutes to login(and you are in danger of the login timing out). So I setup a pair of Duo proxies and am load balancing between them with a layer 7 health check. If a failure occurs there is no delay in login and it just works better.

As the image shows I have integrated SSH logins with Duo as well in a couple of cases, there is no inline pretty self enrollment, but if you happen to not be enrolled, the two factor process with spit out a url to put into your browser upon first login to the SSH host to enroll in two factor.

I deployed the setup to roughly 120 users a few weeks ago, and within a few days roughly 50-60 users had signed up. Internal IT said there were zero – count ’em zero – help desk tickets related to the new system, it was that easy and functional to use. My biggest concern going into this whole project was tight timelines and really no time for any sort of training. Duo security made that possible (even without those timelines I still would of preferred this solution — or at least this type of solution assuming there is something else similar on the market I am not aware of any).

My only support tickets to-date with them were two users who needed to re-register their devices(because they got new devices). Currently we are on the cheaper of the two plans which does not allow self management of devices. So I just login to the Duo admin portal, delete their phone and they can re-enroll at their leisure.

Duo’s plans start as low as $1/user/month. They have a $3/user/month enterprise package which gives more features. They also have an API package for service providers and stuff which I think is $3/user/year (with a minimum number of users).

I am not affiliated with Duo in any way, not compensated by them, not bribed not given any fancy discounts.. but given I have written brief emails to the two companies that have recently deployed two factor I thought I would write this so I could point them and others to my story here to get more insight on a better way to do two factor authentication.

September 17, 2014

NetApp Flash ray ships… with one controller

Filed under: Storage — Tags: , , — Nate @ 10:55 am

Well I suppose it is finally out, or at least in a “limited” way. NetApp apparently is releasing their ground-up rewrite all Flash product Flash Ray, based on a new “MARS” operating system (not related to Ontap).

When I first heard about MARS I heard some promising things, I suppose all of those things were just part of the vision, obviously not where the product is today on launch day. NetApp has been carefully walking back expectations all year. Which turned out to be a smart move, but it seems they didn’t go far enough.

To me it is obvious that they felt severe market pressures and could no longer risk not going to market without their next gen platform available. It’s also obvious that Ontap doesn’t cut it for flash or they wouldn’t of built Flash Ray to begin with.

But shipping a system that only supports a single controller I don’t care if it’s a controlled release or not – giving any customer such a system under any circumstance other than alpha-quality testing just seems absurd.

The “vision” they have is still a good one, on paper anyway — I’m really curious how long it takes them to execute on that vision — given the time it took to integrate the Spinmaker stuff into Ontap. Will it take several years?

In the meantime while your waiting for this vision to come out I wonder what NetApp will offer to get people to want to use this product vs any one of the competing solutions out there. Perhaps by the time this vision is complete this first or second generation of systems will be obsolete anyway.

Current FlashRay system seems to ship with less than 10TB of usable flash (in one system).

On a side note there was some chatter recently about a upcoming EMC XtremIO software update that apparently requires total data loss (or backup & restore) to perform. I suppose that is a sign that the platform is 1) not mature and 2) not designed right(not fully virtualized).

I told 3PAR management back at HP Discover – three years ago they could of counted me as among the people who did not believe 3PAR architecture would be able to adapt to this new era of all flash. I really didn’t have confidence at that time. What they’ve managed to accomplish over the past two years though has just blown me away, and gives me confidence their architecture has many years of life left to it. The main bit missing still is compression – though that is coming.

My new all flash array is of course a 7450 – to start with 4 controllers and ~27TB raw flash (16×1.92TB SSDs), a pair of disk shelves so I can go to as much as ~180TB raw flash (in 8U) without adding any shelves (before compression/dedupe of course). Cost per GB is obviously low(relative to their competition), performance is high(~105k IOPS @ 90% write in RAID 10 @ sub 1ms latency – roughly 20 fold faster than our existing 3PAR F200 with 80x15k RPM in RAID 5 — yes my workloads are over 90% write from a storage perspective), and they have the mature, battle hardened 3PAR OS (used to be named InformOS) running on it.

August 19, 2014

Sprint screwing their subscribers again

Filed under: Random Thought — Tags: — Nate @ 12:15 pm

As a former Sprint customer for more than a decade I though this was interesting news.

My last post about Sprint was appropriately titled “Can Sprint do anything else to drive me away as a customer“. I left Sprint less because I did not like them/service/etc and really more because I wanted to use the HP Pre 3 which was GSM, which meant AT&T (technically could of used T-Mobile but the Pre 3 didn’t support all of T-mobile’s 3G frequencies which meant degraded service coverage). So I was leaving Sprint regardless but they certainly didn’t say or do anything that made me want to second guess that decision.

Anyway, today Sprint announces a big new fancy family plan that is better than the competition.

Except there is one glaring problem with this plan

[..]you’ll have to sign-up between Aug. 22 and Sept. 30, and current subscribers cannot apply.

Yeah, Sprint loves their customers.

On that note I thought this comment was quite interesting on El Reg:

[..]They combine Verizon-level arrogance with truly breath-taking incompetence into one slimy package. Their network stinks, it’s the slowest of the Big Four (and not by a small margin, either), their customer service makes Comcast look good[..]


August 16, 2014

Blog spam stats

Filed under: Random Thought — Nate @ 9:15 am

I just upgraded my Akismet plugin for the first time in a long time and this version gives me all sorts of fun stats about the spam that comes through here (they don’t count my posts as SPAM but maybe they should consider that).

Anyway, the first one was somewhat startling to me, perhaps it shouldn’t be but it was anyway, I had to go back and look when I told wordpress to close comments off on posts older than 90 days (that was done entirely to limit impact of spam see side bar I have a note about re-opening comments if you wish to comment on an older post for a temporary amount of time.

So fortunately my apache logs go back to December 19 2013 as when I did this. Behold the impact!

Impact of disabling comments on posts older than 90 days

Impact of disabling comments on posts older than 90 days

The last 5 months of 2013 generated 97,055 spam, vs the first 8 months(so far) of 2014 has generated 6,360 spam (not even as much as August 2013 alone).

Next up is the all time spam history, which just goes back to 2012, I guess they were not collecting specifics on stats before that I have been a subscriber to this service for longer than that for sure.

TechOpsGuys spam all time

TechOpsGuys spam all time

I’ve never really managed spam here, I rarely look at what is being blocked well there is so much(even now).

August 12, 2014

Some internet routers ran out of memory today

Filed under: Networking — Tags: , , — Nate @ 5:03 pm

(here is a link to in depth analysis on the issue)

Fortunately I didn’t notice any direct impact to anything I personally use. But I first got notification from one of the data center providers we use that they were having network problems they traced it down to memory errors and they frantically started planning for emergency memory upgrades across their facilities. My company does not and has never relied upon this data center for network connectivity so it never impacted us.

A short time later I noticed a new monitoring service that I am using sent out an outage email saying their service providers were having problems early this morning and they had migrated customers away from the affected data center(s).

Then I contacted one of the readers of my blog whom I met a few months ago and told him the story of my data center that is having this issue which sounded similar to a story he told me at the time about his data center provider. He replied with a link to this Reddit article which talks about how the internet routing table exceeded 512,000 routes for the first time today, and that is a hard limit in some older equipment which causes them to either fail, or to perform really slowly as some routes have to be processed in software instead of hardware.

I also came across this article (which I commented on) which mentions similar problems but no reference to BGP or routing tables (outside my comments at the bottom).

[..]as part of a widespread issue impacting major network providers including Comcast, AT&T, Time Warner and Verizon.

One of my co-workers said he was just poking around and could find no references to what has been going on today other than the aforementioned Reddit article. I too am surprised if so many providers are having issues that this hasn’t made more news.

(UPDATE – here is another article from zdnet)

I looked at the BGP routing capacity of some core switches I had literally a decade ago and they could scale up to 1 million unique routes of BGP4 routes in hardware, and 2 million non unique (not quite sure what the difference is anything beyond static routing has never been my thing). I recall seeing routers again many years ago that could hold probably 10 times that (I think the main distinction between a switch and a router is the CPU and memory capacity ? at least for the bigger boxes with dozens to hundreds of ports?)

So it’s honestly puzzling to me how any service provider could be impacted by this today. How any equipment not capable of handling 512k routes is still in use in 2014 (I can understand for smaller orgs but not service providers). I suppose this also goes to show that there is wide spread lack of monitoring of these sorts of metrics. In the Reddit article there is mention of talks going on for months people knew this was coming — well apparently not everyone obviously.

Someone wasn’t watching the graphs.

I’m planning on writing a blog post on the aforementioned monitoring service I recently started using soon too, I’ve literally spent probably five thousand hours over the past 15 years doing custom monitoring stuff and this thing just makes me want to cry it’s so amazingly powerful and easy to use. In fact just yesterday I had someone email me about a MRTG document I wrote 12 years ago and how it’s still listed on the MRTG site even today (I asked the author to remove the link more than a year ago that was the last time someone asked me about it, that site has been offline for 10 years but is still available in the internet archive).

This post was just a quickie inspired by my co-worker who said he couldn’t find any info on this topic, so hey maybe I’m among the first to write about it.

Older Posts »

Powered by WordPress