Sorry for slackin off recently, there just hasn’t been a whole lot out there that has gotten me fired up.
Not too long ago I ranted a bit about outages. Basically saying if your site is down for a few hours, big whoop. It happens to everyone. The world is not going to end, your not going to go out of business.
Now if your website is down for a week or multiple weeks the situation is a bit different. I saw on a news broadcast that experts had warned the White House that the new $600M+ healthcare.gov web site was not ready. But the people leading the project, as it seems so typical probably figured the claims were overblown (are they ever? in my experience they have not been – though I’ve never been involved in a $600M project before, or anywhere close to it) and decided to press onwards regardless.
So they had some architecture issues, some load issues, capacity problems etc. I just thought to myself – this problem really sounds easy to solve from a technical standpoint. They tried to do this to some extent(and failed) apparently with various waiting screens. There are some recent reports that longer term fixes may take weeks to months.
I’ve been on the receiving end of some pretty poorly written/designed applications that it didn’t really matter how much hardware you had it flat out wouldn’t scale. I remember one situation in particular during an outage of some kind and the VP of Engineering interrupted us on the conference call and said Guys – is there anything I can buy that would make this problem go away? The answer back to him was No. At this same company we had Oracle – obviously a big company in the database space come to our company and tell us they had no other customers in the world doing what we were doing, and they could not guarantee results. Storage companies were telling us the same thing. Our OLTP database at the time was roughly 8 times the next largest Oracle OLTP database in the world (which was Amazon). That was, by far the most over designed application I’ve ever supported. It was an interesting experience, I learned a lot. Most other applications that I have supported suffered pretty serious design issues, though none were quite as bad as this one company in particular.
My solution is simple – go old school, take a number and notify people when they can use the website.
Write a little basic app, point healthcare.gov to it, allow people to register with really basic info like name and email address (or phone# if they prefer to use SMS). This would be an entirely separate application not part of the regular web site. This is really light weight application, perhaps even store it in some noSQL solution(for speed) because worst case if you lose the data they’ll just have to come back and register again.
So part of the registration the site would say we’ll send you an email or SMS when your turn is up, with a code, and you’ll have a 24 hour window in which to use the site (past that and you have to register for a new number). If they can get the infrastructure done perhaps they could even have an automated phone system give them a call as well.
Then simply only allow a fraction of the # of people at a time on the website that the system can handle, if they built it for 50,000 people at a time I would probably start with 20,000 the first day or two and see how it goes(20,000 people per day not 20,000 simultaneous). Then ramp it up, if the application is scaling ok. As users register successfully the other application sees this and the next wave of notifications is sent. Recently I heard that officials were recommending people sign up through the call center(s), which I suppose is an OK stop gap but can’t imagine the throughput is very high there either.
I figure it may take a team of developers a few days to come up with such an app.
Shift the load of people trying to hit an expensive application over and over again to a really basic high performance registration application, and put the expensive application behind a barrier requiring an authentication code.
IMO they should of done this from the beginning, perhaps even in advance generating times based on social security numbers or something.
All of this is really designed to manage the flood of initial registrations, once the tidal wave is handled then open the web site up w/o authentication anymore.
There should be a separate, static, high speed site(on many CDNs) that has all of the information people would need to know when signing up, again something that is not directly connected to the transactional system. People can review this info in advance and that would make sign ups faster.
Nate, I was listening to stories about this all week on NPR and I swear, at one point I said to myself; I wonder what a dude like Amsden would do to fix this.
A pragmatic solution that sales / marketing types would hate!
Comment by Dan — October 13, 2013 @ 5:51 pm
Hey Dan! thanks! I was thinking the “administration” folks may not like that approach either, wonder if they can come up with a better plan….
Comment by Nate — October 14, 2013 @ 9:26 am
Maybe BlackBerry read my post
from their new BBM launch
Comment by Nate — October 21, 2013 @ 3:31 pm