A co-worker pointed this video out to me, from a person at Microsoft research, he starts out by saying the views are his own and do not represent Microsoft.
Cloud comes up at 5:36 into the video. The whole video is good(30min), everything is spot on, and he manages to do it in a very entertaining way.
He is very entertaining throughout and does a good job at explaining some of the horrors around cloud services. Myself I still literally suffer from PTSD from the few years I spent with the Amazon cloud. That is not an exaggeration, not a joke, it is real.
Sorry for not having posted recently - I have seen nothing that has gotten me interested enough to post on anything. Tech has been pretty boring. I did attend HOOTERPALOOZA 2014 last night though that was by far the best time I've had in the bay area since I returned to California almost 3 years ago. I will be writing a review of that along with pictures and video soon, will take a day or two to sort through everything.
The above video was so great though I wanted to post it here.
I guess I have to hand it to these folks as they are up front about it.. But I just came across this little gem, where a Seattle-area startup talks about a nearly $6 million loss they are taking for 2013, and really what caught my eye more than anything else is their cloud spend
They spent 25% of their REVENUE on cloud services in 2013 (which for them comes to just over $7 million)
REVENUE, 25% of REVENUE. Oh. my. god.
Now I shouldn't be surprised, having been at a company that was doing just that(that company has since collapsed and was acquired by some Chinese group recently), and know many other companies that are massively over spending on cloud because they are simply clueless.
It is depressing.
What's worse is it just makes everyone's else life harder because people read articles about public cloud and crap and they see all these companies signing up and spending a lot, so they think it is the right thing to do when more often than not (far more often than not) it is the wrong strategy. I won't go into AGAIN specifics on when it is good or not, that is not the point of this post.
The signal to noise ratio of people moving OUT of public cloud vs going INTO it is still way off, rarely do you hear about companies moving out, or why they moved out. I've talked to a BUNCH of companies over the recent years who have moved out of public clouds (or feel they are stuck in their cloud) but those things never seem to reach the press for some reason.
The point of this post is to illustrate how absurd some of the spending is out there on cloud. I am told this company in particular is building their own cloud now apparently I guess they saw the light.
My company moved out of public cloud about two years ago and obviously we have had great success ever since, the $$ saved is nothing compared to the improved availability, flexibility and massive ease of use over a really poor public cloud provider.
Oh as a side note if you use Firefox I highly encourage you to install this plugin, it makes reading about cloud more enjoyable. I've had it for months now and I love it. There is a version for Chrome as well I believe.
Last week Verizon made big news in the cloud industry that they were shifting gears significantly and were not going to have their clouds built on top of traditional enterprise equipment from the likes of HP, Cisco, EMC etc.
I can't find an article on it but I recall hearing on CNBC that AT&T announced something similar - that was going to result in them in saving $2 billion over some period of time that I can't remember.
Today our friends at The Register reveal that this design win actually comes from AMD's Seamicro unit. AMD says they have been working closely with Verizon for two years on designs for a highly flexible and efficient platform to scale with.
Seamicro has a web page dedicated to this announcement.
Some of the capabilities include:
- Fine-grained server configuration options that match real life requirements, not just small, medium, large sizing, including processor speed (500 MHz to 2,000 MHz) and DRAM (.5 GB increments) options
- Shared disks across multiple server instances versus requiring each virtual machine to have its own dedicated drive
- Defined Storage quality of service by specifying performance up to 5,000 IOPS to meet the demands of the application being deployed, compared to best-effort performance
- Strict traffic isolation, data encryption, and data inspection with full featured firewalls that achieve Department of Defense and PCI compliance levels
- Reserved network performance for every virtual machine up to 500 Mbps
I don't see much more info than that. Questions that remain with me are what level of SMP will they support, and what processor(s) are they using (specifically are they using AMD procs or Intel procs since Seamicro can use both, Intel has obviously been dominating the cloud landscape, so it would be nice to see a new large scale deployment of AMD).
I have written about SeaMicro a couple times in the past, most recently comparing HP's Moonshot to the AMD platform. In those posts I mentioned how I felt that Moonshot fell far short of what Seamicro seems to be capable of offering. Given Verizon's long history as a customer of HP, I can't help but assume that HP tried hard to get them to consider Moonshot but fell short on the technology(or timing, or both).
Seamicro, to my knowledge (I don't follow micro servers too closely) is the only micro server platform that offers fully virtualized storage, both inside the chassis as well as more than 300TB of external storage. One of the unique abilities that sounds nice for larger scale deployments is the ability to export essentially read only snapshots of base operating systems to many micro servers for easier management(and you could argue more secure given they are read only), without needing fancy SAN storage. It's also fairly mature (relatively to the competition) given it's been on the market for several years now.
Verizon/Terremark obviously had some trouble competing with the more commodity players with their enterprise platform both on cost and on capabilities. I was a vCloudExpress user for about a year, and worked through an RFP with them at one of my former companies for a disaster recovery project. Their cost model, like most cloud providers was pretty insane. The assumption we had at the time is we were a small company without much purchasing leverage, so expected the cost to be pretty decent given the volumes a cloud provider can command. Though reality set in quick when their cost was at least 5-6 fold what our cost was for the same capabilities from similar enterprise vendors.
Other providers had similar pricing models, and I continue to hear stories to this day about various providers costing too much relative to doing things in house (there really is no exception), with ROIs really never exceeding 12 months. I think I've said many times but I'll say it again - I'll be the first one to be willing to pay a premium for something that gives premium abilities. None of them come close to meeting that though. Not even in the same solar system at this point.
This new platform will certainly make Verizon's cloud offering more competitive, they are having to build an entirely new control platform for it though - not much off the shelf software here, simply because none of it is built to that level of scale. Such problems are difficult to address, and until you encounter them you probably won't anticipate what is required to solve them.
I am mainly curious whether or not these custom things that AMD built for Verizon -- if those will be available to other cloud players. I assume they will..
I came across this article on LinkedIn which I found very interesting. The scenario given by the article was a professional photographer had 500GB of data to backup and they decided to try Carbonite to do it.
The problem was Carbonite apparently imposes significant throttling on the users uploading large amounts of data -
[..]At that rate, it takes nearly two months just to upload the first 200GB of data, and then another 300 days to finish uploading the remaining 300GB.
Which takes me back to a conversation I was having with my boss earlier in the week about why I decided to buy my own server and put it in a co-location facility, instead of using some sort of hosted thing.
I have been hosting my own websites, email etc since about 1996. At one point I was hosted on T1s at an office building, then I moved things to my business class DSL at home for a few years, then when that was no longer feasible I got a used server and put it up at a local colo in Seattle. Then I decided to retire that old server(build in 2004) and spent about a year in the Terremark vCloud, before buying a new server and putting it up at a colo in the Bay area where I live now.
My time in the Terremark cloud was OK, my needs were pretty minimal, but I didn't have a lot of flexibility(due to the costs). My bill was around $120/mo or something like that for a pair of VMs. Terremark operates in a Tier 4 facility and doesn't use the built to fail model I hate so much, so I had confidence things would get fixed if they ever broke, so I was willing to pay some premium for that.
Cloud or self hosting for my needs?
I thought hard about whether or not to invest in a server+colo again or stay on some sort of hosted service. The server I am on today was $2,900 when I bought it, which is a decent amount of money for me to toss around in one transaction.
Then I had the idea of storing data off site, I don't have much that is critical, mostly media files and stuff that would take a long time to re-build in case of major failure or something. But I wanted something that could do at least 2-3TB of storage.
So I started looking into what this would cost in the cloud. I was sort of shocked I guess you could say. The cost for regular, protected cloud storage was going to easily be more than $200/mo for 3TB of usable space.
Then there are backup providers like Carbonite, Mozy, Backblaze etc.. I read a comment on Slashdot I think it was about Backblaze and was pretty surprised to then read their fine print -
Your external hard drives need to be connected to your computer and scanned by Backblaze at least once every 30 days in order to keep them backed up.
So the data must be scanned at least once every 30 days or it gets nuked.
They also don't support backing up network drives. Most of the providers of course don't support Linux either.
The terms do make sense to me, I mean it costs $$ to run, and they advertise unlimited. So I don't expect them to be storing TBs of data for only $4/mo. It just would be nice if they (and others) would be more clear on their limitations up front, at least unlike the person in the article above I was able to make a more informed decision.
The only real choice: Host it myself
So the decision was really simple at that point. Go invest and do it myself. It's sort of ironic if you think about it, all this talk about cloud saving people money. Here I am, just one person, with no purchasing power whatsoever and I am saving more money doing it myself then some massive scale service provider can offer it.
The point wasn't just the storage though. I wanted something to host:
- This blog
- My email
- my other websites / data
- would be nice if there was a place to experiment/play as well
So I bought this server which is a single socket quad core Intel chip, originally with 8GB, now it has 16GB of memory, and 4x2TB SAS disks in RAID 1+0(~3.6TB usable) w/3Ware hardware RAID controller(I've been using 3Ware since 2001). It has dual power supplies(though both are connected to the same source, my colo config doesn't offer redundant power). It even has out of band management with full video KVM and virtual media options. Nothing like the quality of HP iLO, but far better than what a system of this price point could offer going back a few years ago.
On top of that I am currently running 5 active VMs
- VM #1 runs my personal email, DNS,websites, this blog etc
- VM #2 runs email for a few friends, and former paying customers(not sure how many are left) from an ISP that we used to run many years ago, DNS, websites etc
- VM #3 is a OpenBSD firewall running in layer 3 mode, also provides site to site VPN to my home, as well as a end-user VPN for my laptop when I'm on the road)
- VM #4 acts as a storage/backup server for my home data with a ~2TB file system
- VM #5 is a windows VM in case I need one of those remotely. It doesn't get much use.
- VM #6 is the former personal email/dns/website server that ran a 32-bit OS. Keeping it around on an internal IP for a while in case I come across more files that I forgot to transfer.
There is an internal and an external network on the server, the site to site VPN of course provides unrestricted access to the internal network from remote which is handy since I don't have to rely on external IPs to run additional things. The firewall also does NAT for devices that are not on external IPs.
Obviously as you might expect the server sits at low CPU usage 99% of the time and it's running at around 9GB of used memory, so I can toss on more VMs if needed. It's obviously a very flexible configuration.
When I got the server originally I decided to host it with the company I bought it from, and they charged me $100/mo to do it. Unlimited bandwidth etc.. good deal(also free on site support)! First thing I did was take the server home and copy 2TB of data onto it. Then I gave it back to them and they hosted it for a year for me.
Then they gave me the news they were going to terminate their hosting and I had only two weeks to get out. I evaluated my options and decided to stay at the same facility but started doing business with the facility itself (Hurricane Electric). The down side was the cost was doubling to $200/mo for the same service (100Mbit unlimited w/5 static IPs), since I was no longer sharing the service with anyone else. I did get a 3rd of a rack though, not that I can use much of it due to power constraints(I think I only get something like 200W). But in the grand scheme of things it is a good deal, I mean it's a bit more than double what I was paying in the Seattle area but I am getting literally 100 times the bandwidth. That gives me a lot of opportunities to do things. I've yet to do much with it beyond my original needs, that may change soon though.
Now granted it's not high availability, I don't have 3PAR storage like Terremark did when I was a customer, I have only 1 server so if it's down everything is down. It's been reliable though, providing really good uptime over the past couple of years. I have had to replace at least two disks, and I also had to replace the USB stick that runs vSphere the previous one seemed to have run out of flash blocks as I could no longer write much to the file system. That was a sizable outage for me as I took the time to install vSphere 5.1 (from 4.x) on the new USB stick, re-configure things as well as upgrade the memory all in one day, took probably 4-5 hours I think. I'm connected to a really fast backbone and the network has been very reliable (not perfect, but easily good enough).
So my server was $2,900, and I pay currently $2,400/year for service. It's certainly not cheap, but I think it's a good deal still relative to other options. I maintain a very high level of control, I can store a lot of data, I can repair the system if it breaks down, and the solution is very flexible, I can do a lot of things with the virtualization as well as the underlying storage and the high bandwidth I have available to me.
Which brings me to next steps, something I've always wanted to do is make the data more mobile, that is one area which it was difficult(or impossible) to compete with cloud services, especially on things like phones and tablets. Since they have the software R&D to make those "apps" and other things.
I have been using WebOS for several years now, which of course runs on top of Linux. Though the underlying Linux OS is really too minimal to be of any use to me. It's especially useless on the phone where I am just saddened that there has never been a decent terminal emulation app released for WebOS. Of all the things that could be done, that one seems really trivial. But it never happened(that I could see, there were a few attempts but nothing usable as far as I could tell). On the touchpad things were a little different, you could get an Xterm and it was kind of usable, significantly more so than the phone. But still the large overhead of X11 just to get a terminal seemed quite wasteful. I never really used it very much.
So I have this server, and all this data sitting on a fast connection but I didn't have a good way to get to it remotely unless I was on my laptop (except for the obvious like the blog etc are web accessible).
Time to switch to new mobile platform
WebOS is obviously dead(RIP), in the early days post termination of the hardware unit at HP I was holding out some hope for the software end of things but that hope has more or less dropped to 0 now, nothing remains but disappointment of what could of been. I think LG acquiring the WebOS team was a mistake and even though they've announced a WebOS-powered TV to come out early next year, honestly I'll be shocked if it hits the market. It just doesn't make any sense to me to run WebOS on a TV outside of having a strong ecosystem of other WebOS devices that you can integrate with.
So as reality continued to set in, I decided to think about alternatives, what was going to be my next mobile platform. I don't trust Google, don't like Apple. There's Blackberry and Windows Phone as the other major brands in the market. I really haven't spent any time on any of those devices. So I suppose I won't know for sure but I did feel that Samsung had been releasing some pretty decent hardware + software (based on stuff I have read only), and they obviously have good market presence. Some folks complain etc.. If I were to go to a Samsung Android platform I probably wouldn't have an issue. Those complaining about their platform probably don't understand the depression that WebOS has been in since about 6 months after it was released - so really anything relative to that is a step up.
I mean I can't even read my personal email on my WebOS device without using the browser. Using webmail via the browser on WebOS for me at least is a last resort thing, I don't do it often(because it's really painful - I bought some skins for the webmail app I use that are mobile optimized only to find they are not compatible with WebOS so when on WebOS I use a basic html web mail app, it gets the job done but..). The reason I can't use the native email client is I suppose in part my fault, the way I have my personal email configured is I have probably 200 email addresses and many of them go directly to different inboxes. I use Cyrus IMAP and my main account subscribes to these inboxes on demand. If I don't care about that email address I unsubscribe and it continues to get email in the background. WebOS doesn't support accessing folders via IMAP outside of the INBOX structure of a single account. So I'm basically SOL for accessing the bulk of my email (which doesn't go to my main INBOX). I have no idea if Samsung or Android works any different.
I suppose my personal favorite problem is not being able to use bluetooth and 2.4Ghz wifi at the same time on my phone. The radios conflict, resulting in really poor quality over bluetooth or wifi or both. So wifi stays disabled the bulk of the time on my phone since most hotspots seem only to do 2.4Ghz, and I use bluetooth almost exclusively when I make voice calls.
There are tons of other pain points for me on WebOS, and I know they will never get fixed, those are just a couple of examples. WebOS is nice in other ways of course, I love the Touchstone (inductive charging) technology for example, the cards multitasking interface is great too(though I don't do heavy multi tasking).
So I decided to move on. I was thinking Android, I don't trust Google but, ugh, it is Linux based and I am a Linux user(I do have some Windows too but my main systems desktops, laptops are all Linux) and I believe Windows Phone and BlackBerry would likely(no, certainly) not play as well with Linux as Android. (WebOS plays very well with Linux, just plug it in and it becomes a USB drive, no restrictions - rooting WebOS is as simple as typing a code into the device). There are a few other mobile Linux platforms out there, I think Meego(?) might be the biggest trying to make a come back, then there is FirefoxOS and Ubuntu phone.. all of which feel less viable(in today's market) than WebOS did back in 2009 to me.
So I started thinking more about leaving WebOS, and I think the platform I will go to will be the Samsung Galaxy Note 3, some point after it comes out(I have read ~9/4 for the announcement or something like that). It's much bigger than the Pre3, not too much heavier(Note 2 is ~30g heavier). Obviously no dedicated keyboard, I think the larger screen will do well for typing with my big hands. The Samsung multimedia / multi tasking stuff sounds interesting(ability to run two apps at once, at least Samsung apps).
I do trust Samsung more than Google, mainly because Samsung wants my $$ for their hardware. Google wants my information for whatever it is they do..
I'm more than willing to trade money in a vein attempt to maintain some sort of privacy. In fact I do it all the time, I suppose that could be why I don't get much spam to my home address(snail mail). I also very rarely get phone calls from marketers(low single digits per year I think), even though I have never signed up to any do not call lists(I don't trust those lists).
Then I came across this comment on Slashdot -
Well I can counter your anecdote with one of my own. I bought my Galaxy S3 because of the Samsung features. I love multi-window, local SyncML over USB or WiFi so my contacts and calendar don't go through the "cloud", Kies Air for accessing phone data through the browser, the Samsung image gallery application, the ability to easily upgrade/downgrade/crossgrade and even load "frankenfirmware" using Odin3, etc. I never sign in to any Google services from my phone - I've made a point of not entering a Google login or password once.
So, obviously, I was very excited to read that.
Next up, and this is where the story comes back around to online backup, cloud, my co-lo, etc.. I didn't expect the post to be this long but it sort of got away from me again..
I think it was on another Slashdot comment thread actually (I read slashdot every day but never have had an account and I think I've only commented maybe 3 times since the late 90s), where someone mentioned the software package Owncloud.
Just looking at the features, once again got me excited. They also have Android and IOS apps. So this would, in theory, from a mobile perspective allow me to access files, sync contacts, music, video, perhaps even calendar(not that I use one outside of work which is Exchange) and keep control over all of it myself. Also there are desktop sync clients (ala dropbox or something like that??) for Linux, Mac, and Windows.
So I installed it on my server, it was pretty easy to setup, I pointed it to my 2TB of data and off I went. I installed the desktop sync client on several systems(Ubuntu 10.04 was the most painful to install to, had to compile several packages from source but it's nothing I haven't done a million times before on Linux). The sync works well (had to remove the default sync which was to sync everything, at first it was trying to sync the full 2TB of data, and it kept failing, not that I wanted to sync that much...I configured new sync directives for specific folders).
So that's where I'm at now. Still on WebOS, waiting to see what comes of the new Note 3 phone, I believe I saw for the Note 2 there was even a custom back cover which allowed for inductive charging as well.
It's sad to think of the $$ I dumped on WebOS hardware in the period of panic following the termination of the hardware division, I try not to think about it ..... The touchpads do make excellent digitial picture frames especially when combined with a touchstone charger. I still use one of my touchpads daily(I have 3), and my phone of course daily as well. Though my data usage is quite small on the phone since there really isn't a whole lot I can do on it, unless I'm traveling and using it as a mobile hot spot.
whew, that was a lot of writing.
Of note, two big cloud companies were on the list with multiple outages - Amazon having at least three outages and Azure right behind it at two. Outages have been a blight on both services for years.
I don't know about you, but short of a brief time at a poor hosting facility in Seattle (I joined a company in Spring of 2006 that was hosted there and we were moved out by Fall of 2006 - we did go through one power outage while I was there if I recall right), the number of infrastructure related outages I've been through over the past decade have been fairly minimal compared to the number experienced by these cloud companies. The number of application related outages (and total downtime minutes incurred by said applications) out numbers infrastructure related things for me I'd say by at least 1,000:1.
Amazon has had far more downtime for companies that I have worked for (either before or since I was there) than any infrastructure related outages at companies I was at where they hosted their own stuff. I'd say it's safe to say an order of magnitude more outages. Of course not all of these are called outages by Amazon, they leave themselves enough wiggle room to drive an aircraft carrier through in their SLAs. My favorite one was probably the forced reboot of their entire infrastructure.
Unlike infrastructure related outages at individual companies, obviously these large service provider outages have much larger consequences for very large numbers of customers.
Speaking of cloud, I heard that HP brought their own cloud platform out of beta recently. I am not a fan of this cloud either, basically they tried to clone what Amazon is doing in their cloud, which infrastructure wise is a totally 1990s way of doing things (with APIs on top to make it feel nice). Wake me up when these clouds get the ability to pool CPU/memory/storage and have the ability to dynamically configure systems without fixed configurations.
If the world happens to continue on after December 22nd @ 3:11AM Pacific time, and I don't happen to see you before Christmas - have a good holiday from all of us monkeys at Techopsguys.
(originally I had this on the post above this but I thought it better to split it out since it morphed into something that suited a dedicated post)
I dug into their web site a bit and they really seem to have some interesting technology. They are based out of Europe, but also have a U.S. data center somewhere too. They claim more than 1,000 customers, and well over 100 engineers working on the software.
While Profitbricks does not offer pooling of resources they do have several key architectural advantages that other cloud offerings that I've come across lack:
- Entirely Infiniband-based network with 80Gbps of throughput per server
- Double redundant high speed storage w/SSD caching that is persistent!
- Up to 48 CPU cores and 196GB memory/server
- Fine grained provisioning in increments of 1 CPU core, 1GB of storage and fractional GB of memory
- User defined networking - optional multiple VLANs, optional multiple NICs on the servers, all of it dynamic
- Data center designer - drag & drop your data center design - point and click your configuration.
They really did a good job at least on paper, I haven't used this service, though I did play around with their data center designer
Their load balancing offering appears to be quite weak (weaker than Amazon's own offering), but you can deploy a software load balancer like Riverbed Stingray (formerly Zeus). I emailed them about this and they are looking into Stingray, perhaps they can get a partnership going and have it be an offering with their service. Amazon has recently improved their load balancing partnerships and you can now run at least Citrix Netscaler as well as A10 Networks' SoftAX in EC2, in addition to Riverbed Stingray. Amazon's own Elastic Load Balancer is worse than useless in my experience. I'd rather rely on an external DNS-based load balancing from the likes of Dynect than use ELB. Even with Stingray it can take several seconds (up to about 30) for the system to fail over with Elastic IPs, vs normally sub second fail over when your operating your own infrastructure.
Anyway back to Proifitbricks, I was playing around with their designer tool and I was not sure how best to connect servers that would be running load balancers(assuming they don't provide the ability to do IP-takeover). I thought maybe have one LB in each zone, and advertise both data center IP addresses (this is a best practice in any case at least for larger providers). Though in the above I simplified it a bit to a single internet access point and using one of ProfitBricks round robin load balancers to distribute layer 4 traffic to the servers behind it(running Stingray). Some real testing would of course have to go into play and further discussions before I'd run production stuff on it obviously (and I have no need for IaaS cloud right now anyway).
So they have all this, and still their pricing is very competitive. They also claim very high level of support as well which is good to see.
I'll certainly keep them in mind in the event I need IaaS in the future, they seem to know the failings of first generation cloud companies and are doing good things to address them. Now if they could only address the point of lack of resource pooling I'd be really happy!
Saw an interesting article over at Slashdot, then went to GigaOm, and then went to the source. Aside from the sick feeling I felt when a cloud storage provider is sourcing their equipment through Costco, or Newegg, the more interesting aspect to the Backblaze story, that I wasn't aware of before is the people in the Slashdot thread pointing out the limitations to their platform.
Here I was thinking Backblaze is a cheap way to store stuff off site but its a lot more complicated than that. For my own off site backups I use my own hardware hosted in a co-location facility that is nearby. The cost is reasonable considering the flexibility I have (it seems far cheaper than any cloud storage I have come across anyways which honestly surprises me given I have no leverage to buy hardware).
Anyways back to Backblaze, the model really sort of reminds me of the model that so many people complain about when it comes to broadband and wireless data usage plans. The price is really cheap - they did that part well.
The most startling thing to me is they delete data 30 days after you delete it locally. They don't allow storage of many common types of files like ISO images, virtual machine images. They acknowledge
Backblaze is not designed as an additional storage system when you run out of space.
(having wrote this I could just as easily see consumer internet companies saying that they are not designed to replace your Cable/Satellite with Netflix+Hulu)
At the same time they advertise unlimited storage. They talk about how much cheaper they are than (shudder) Amazon, and other providers (as well as doing things in house), but don't mention these key comparison points. I believe one of the posts on slashdot even claimed that Backblaze goes out of their way to detect network drives and perhaps iSCSI attached network storage and blocks that from being backed up as well.
For all I know the other players in the space have similar terms, I haven't investigated, I was just kind of surprised to see such mixed messages coming from them, from one side they say unlimited storage for a cheap rate, while at the same time they put all sorts of restrictions on it.
The up side is of course they seem to be fairly up front about what they limit when you dig more into the details, but at the same time the broadband and wireless data providers are fairly upfront as well, but that doesn't stop people from complaining at ever increasing volumes.
I'd think they could do a lot more if they expanded the scope of their support with tiers of service, For example extending the window of storage from 30 days to some arbitrary longer period, for some marginal increase in cost. But maybe not.
I'm sure they run a fine service for the target market, I was always sort of curious how they managed the cost model, outside of the hardware anyways, reading this today really enlightened me as to how that strategy works.
Learn something new every day (almost).
I've seen a couple different articles from our friends at The Register on the launch of the HP IaaS cloud as a public beta. There really isn't a whole lot of information yet, but one thing seems unfortunately clear - HP has embraced the same backwards thinking as Amazon when it comes to provisioning. Going against the knowledge and experience we've all gained in the past decade around sharing resources and over subscription.
Yes - it seems they are going to have fixed instance sizes and no apparent support for resource pools. This is especially depressing from someone like HP who has technology like thin provisioning, and partnerships with the likes of all of the major hypervisor players.
Is the software technology at the hypervisor just not there yet to provide such a service? vSphere 5 for example supports 1600 resource pools per cluster. I don't like the licensing model of 5, so I built my latest cluster on 4.1 - which supports 512 resource pools per cluster. Not a whole lot in either case but then again cluster sizes are fairly limited anyways.
There's no doubt that gigabyte to gigabyte that DAS is cheaper than something like a 3PAR V800. But with fixed allocation sizes from the likes of Amazon - it's not uncommon to have disk utilization rates hovering in low single digits. I've seen it at two different companies - and guess what - everyone else on the teams (all of whom have had more Amazon experience than me) was just as not surprised as I was.
So you take this cheap DAS and you apply a 4 or 5% utilization rate to it - and all of a sudden it's not so cheap anymore is it ? Why is utilization so low ? Well in Amazon (since I haven't use HP's cloud), it's primarily low because that DAS is not protected, if the server goes down or the VM dies the storage is gone. So people use other methods to protect their more important data. You can have the OS and log files and stuff on there no big deal if that goes away - but again - your talking about maybe 3-5GB of data (which is typical for me at least). Then the rest of the disk goes unused.
Go to the most inefficient storage company in the world and and even they will drool at the prospects of replacing storage that your only getting 5% utilization out of! Because really even the worst efficiency is maybe 20% on older systems w/o thin provisioning.
Even if the storage IS fully protected - the fixed allocation units are still way out of whack and they can't be shared! I may need a decent amount of CPU horsepower and/or (more likely) memory to run a bloated application but I don't need several hundred gigabytes of storage attached to each system when 20GB would be MORE than enough(my average OS+App installation comes in at under 3GB and that's with a lot of stuff installed)! I'd rather take that several hundred gigabytes both in terms of raw space and IOPS and give them to database servers or something like that(in theory at least, the underlying storage in this case is poor so I wouldn't want to use it for that anyways).
This is what 3PAR was built to solve - drive (way)utilization up, while simultaneously providing the high availability and software features of a modern storage platform. Others do the same too of course with various degrees of efficiency.
So that's storage - next take CPU. The industry average pre-virtualization was in the sub 20% utilization range - my own personal experience says it's in the sub 10% range for the most part. There was a quote from a government official a couple years back that talked about how their data centers are averaging about 7% utilization. I've done a few virtualization projects over the years and my experience shows me that even after systems have been virtualized the vmware hosts themselves are at low utilization from a CPU perspective.
Two projects in particular that I documented while I was at a company a few years ago while back - the most extreme perhaps being roughly 120VMs on 5 servers, four of them being HP DL585 G1s - which were released in 2005. They had 64GB of memory on them but they were old boxes. I calculated that the newer Opteron 6100 when it was released had literally 12 times the CPU power(according to SPEC numbers at least) of the Opteron 880s that we had at the time. Anyways, even with these really old servers the cluster averaged under 40% CPU - with peaks to maybe 50 or 60%. Memory usage was pretty constant at around 70-75%. Imagine translating that workload on those ancient servers onto something more modern and you'd likely see CPU usage rates drop to single digits while memory usage remains constant.
I have no doubt that the likes of HP and Amazon are building their cloud to specifically not oversubscribe - to assume that people will utilize all of the CPU allocated to them as well as memory and disk space. So they have fixed building blocks to deal with and they carve them up accordingly.
The major fault with the design of course is the vast majority of workloads do not fit in such building blocks and will never come close to utilizing all of the resources that are provided - thus wasting an enormous amount of resources in the environment. What's Amazon's solution to this ? Build your apps to better utilize what they provide. basically work around their limitations. Which, naturally most people don't do so resources end up being wasted on a massive scale.
I've worked for really nothing but software development companies for almost 10 years now and I have never really seen even one company or group or developer ever EVER design/build for the hardware. I have been part of teams that have tried to benchmark applications and buy the right sized hardware but it really never works out in the end because a simple software change can throw all those benchmarks and testing out the window overnight(not to mention how traditionally difficult it is to replicate real traffic in a test environment - I've yet to see it done right myself for any even moderately complex application). The far easier solution to take is of course, resource pools, and variably allocated resources.
Similarly this model, along with the per-VM licensing model of so many different products out there go against the trend that has allowed us to have more VM sprawl I guess. Instead of running a single server with a half dozen different apps it's become a good practice to split those apps up. This fixed allocation unit of the cloud discourages such behavior by dramatically increasing the cost of doing it. You still incur additional costs by doing it on your own gear - memory overhead for multiple copies of the OS (assuming that memory de-dupe doesn't work -which for me on Linux it doesn't), or disk overhead (assuming your array doesn't de-dupe -which 3PAR doesn't - but the overhead is so trivial here that it is a rounding error). But those incremental costs pale in comparison to massive increases in cost in the cloud, because again of those fixed allocation units.
I have seen no mention of it yet, but I hope HP has at least integrated the ability to do live migration of VMs between servers. The hypervisor they are using supports it of course, I haven't seen any details from people using the service as to how it operates yet.
I can certainly see a need for cheap VMs on throwaway hardware. I see an even bigger need, for the more traditional customers(that make up the vast, vast majority of the market) to have this model of resource pools instead. If HP were to provide both services - and a unified management UI that really would be pretty nice to see.
The concept is not complicated, and is so obvious it dumbfounds me why more folks aren't doing it (only thought is perhaps the technology these folks are using isn't capable) - IaaS won't be worth while to use in my opinion until we have that sort of system in place.
HP is obviously in a good position when it comes to providing 3PAR technology as a cloud since they own the thing their support costs would be a fraction of what their customers pay and they would be able to consume unlimited software for nothing. Software typically makes up at least half the cost of a 3PAR system(the SPC-1 results and costs of course only show the bare minimum software required). Their hardware costs would be significantly less as well since they would not need much(any?) margin on it.
I remember SAVVIS a few years ago wanting to charge me ~$200,000/year for 20TB usable on 10k RPM storage on a 3PAR array, when I could of bought 20TB usable on 15k RPM storage on a 3PAR array(+ hosting costs) for less than one year's costs at SAVVIS. I heard similar stories from 3PAR folks where customers would go out to the cloud to get pricing thinking it might be cheaper than doing it in house but always came back being able to show massive cost savings by keeping things in house.
They are also in a good position as a large server manufacturer to get amazing discounts on all of their stuff and again of course don't have to make as much margin for these purposes (I imagine at least). Of course it's a double edged sword pissing off current and potential customers that may use your equipment to try to compete in that same space.
I have hope still, that given HP's strong presence in the enterprise and in house technology and technology partners that they will, at some point offer an enterprise grade cloud, something where I can allocate a set amount of CPU, memory, maybe even give me access to a 3PAR array using their Virtual Domain software, and then provision whatever I want within those resources - and billing would be based on some sort of combination of a fixed price for base services and variable price based on actual utilization (bill for what you use rather than what you provision), with perhaps some minimum usage thresholds (because someone has to buy the infrastructure to run the thing). So say I want a resource pool with 1TB of ram and 500Ghz of CPU. Maybe I am forced to pay for 200GB of ram and 50Ghz of CPU as a baseline, then anything above that is measured and billed accordingly.
Don't let me down HP.
I mentioned not long ago that I was going co-lo once again. I was co-lo for a while for my own personal services but then my server started to act up (the server was 6 years old if it was still alive today) with disk "failure" after failure (or at least that's what the 3ware card was predicting eventually it stopped complaining and the disk never died again). So I thought - do I spent a few grand to buy a new box or go "cloud". I knew up front cloud would cost more in the long run but I ended up going cloud anyways as a stop gap - I picked Terremark because it had the highest quality design at the time(still does).
During my time with Terremark I never had any availability issues, there was one day where there was some high latency on their 3PAR arrays though they found & fixed whatever it was pretty quick (didn't impact me all that much).
I had one main complaint with regards to billing - they charge $0.01 per hour for each open TCP or UDP port on their system, and they have no way of doing 1:1 NAT. For a web server or something this is no big deal, but for me I needed a half dozen or more ports open per system(mail, dns, vpn, ssh etc) after cutting down on ports I might not need, so it starts to add up, indeed about 65% of my monthly bill ended up being these open TCP and UDP ports.
Once both of my systems were fully spun up (the 2nd system only recently got fully spun up as I was too lazy to move it off of co-lo) my bill was around $250/mo. My previous co-lo was around $100/mo and I think I had them throttle me to 1Mbit of traffic (this blog was never hosted at that co-lo).
The one limitation I ran into on their system was that they could not assign more than 1 IP address for outbound NAT per account. In order to run SMTP I needed each of my servers to have their own unique outbound IP. So I had to make a 2nd account to run the 2nd server. Not a big deal(for me, ended up being a pain for them since their system wasn't setup to handle such a situation), since I only ran 2 servers (and the communications between them were minimal).
As I've mentioned before, the only part of the service that was truly "bill for what you use" was bandwidth usage, and for that I was charged between 10-30 cents/month for my main system and 10 cents/month for my 2nd system.
Oh - and they were more than willing to setup reverse DNS for me which was nice (and required for running a mail server IMO). I had to agree to a lengthy little contract that said I wouldn't spam in order for them to open up port 25. Not a big deal. The IP addresses were "clean" as well, no worries about black listing.
Another nice thing to have if they would of offered it is billing based on resource pools, as usual they charge for what you provision(per VM) instead of what you use. When I talked to them about their enterprise cloud offering they charged for the resource pool (unlimited VMs in a given amount of CPU/memory), but this is not available on their vCloud Express platform.
It was great to be able to VPN to their systems to use the remote console (after I spent an hour or two determining the VPN was not going to work in Linux despite my best efforts to extract linux versions of the vmware console plugin and try to use it). Mount an ISO over the VPN and install the OS. That's how it should be. I didn't need the functionality but I don't doubt I would of been able to run my own DHCP/PXE server there as well if I wanted to install additional systems in a more traditional way. Each user gets their own VLAN, and is protected by a Cisco firewall, and load balanced by a Citrix load balancer.
A couple of months ago the thought came up again of off site backups. I don't really have much "critical" data but I felt I wanted to just back it all up, because it would be a big pain if I had to reconstruct all of my media files for example. I have about 1.7TB of data at the moment.
So I looked at various cloud systems including Terremark but it was clear pretty quick no cloud company was going to be able to offer this service in a cost effective way so I decided to go co-lo again. Rackspace was a good example they have a handy little calculator on their site. This time around I went and bought a new, more capable server.
So I went to a company I used to buy a ton of equipment from in the bay area and they hooked me up with not only a server with ESXi pre-installed on it but co-location services (with "unlimited" bandwidth), and on-site support for a good price. The on-site support is mainly because I'm using their co-location services(which in itself is a co-lo inside Hurricane Electric) and their techs visit the site frequently as-is.
My server is a single socket quad core processor, 4x2TB SAS disks (~3.6TB usable which also matches my usable disk space at home which is nice - SAS because VMware doesn't support VMFS on SATA though technically you can do it the price premium for SAS wasn't nearly as high as I was expecting), 3ware RAID controller with battery backed write-back cache, a little USB thing for ESXi(rather have ESXi on the HDD but 3ware is not supported for booting ESXi), 8GB Registered ECC ram and redundant power supplies. Also has decent remote management with a web UI, remote KVM access, remote media etc. For co-location I asked (and received) 5 static IPs (3 IPs for VMs, 1 IP for ESX management, 1 IP for out of band management).
My bandwidth needs are really tiny, typically 1GB/month. Though now with off site backups that may go up a bit (in bursts). Only real drawback to my system is the SAS card does not have full integration with vSphere so I have to use a cli tool to check the RAID status, at some point I'll need to hook up nagios again and run a monitor to check on the RAID status. Normally I setup the 3Ware tools to email me when bad things happen, pretty simple, but not possible when running vSphere.
The amount of storage on this box I expect to last me a good 3-5 years. The 1.7TB includes every bit of data that I still have going back a decade or more - I'm sure there's a couple hundred gigs at least I could outright delete because I may never need it again. But right now I'm not hurting for space so I keep it there, on line and accessible.
My current setup
- One ESX virtual switch on the internet that has two systems on it - a bridging OpenBSD firewall, and a Xangati system sniffing packets(still playing with Xangati). No IP addresses are used here.
- One ESX virtual switch for one internal network, the bridging firewall has another interface here, and my main two internet facing servers have interfaces here, my firewall has another interface here as well for management. Only public IPs are used here.
- One ESX virtual switch for another internal network for things that will never have public IP addresses associated with them, I run NAT on the firewall(on it's 3rd/4th interfaces) for these systems to get internet access.
I have a site to site OpenVPN connection between my OpenBSD firewall at home and my OpenBSD firewall on the ESX system, which gives me the ability to directly access the back end, non routable network on the other end.
Normally I wouldn't deploy an independent firewall, but I did in this case because, well I can. I do like OpenBSD's pf more than iptables(which I hate), and it gives me a chance to play around more with pf, and gives me more freedom on the linux end to fire up services on ports that I don't want exposed and not have to worry about individually firewalling them off, so it allows me to be more lazy in the long run.
I bought the server before I moved, once I got to the bay area I went and picked it up and kept it over a weekend to copy my main data set to it then took it back and they hooked it up again and I switched my systems over to it.
The server was about $2900 w/1 year of support, and co-location is about $100/mo. So disk space alone the first year(taking into account cost of the server) my cost is about $0.09 per GB per month (3.6TB), with subsequent years being $0.033 per GB per month (took a swag at the support cost for the 2nd year so that is included). That doesn't even take into account the virtual machines themselves and the cost savings there over any cloud. And I'm giving the cloud the benefit of the doubt by not even looking at the cost of bandwidth for them just the cost of capacity. If I was using the cloud I probably wouldn't allocate all 3.6TB up front but even if you use 1.8TB which is about what I'm using now with my VMs and stuff the cost still handily beats everyone out there.
What's the most crazy is I lack the purchasing power of any of these clouds out there, I'm just a lone consumer, that bought one server. Granted I'm confident the vendor I bought from gave me excellent pricing due to my past relationship, though probably still not on the scale of the likes of Rackspace or Amazon and yet I can handily beat their costs without even working for it.
What surprised me most during my trips doing cost analysis of the "cloud" is how cheap enterprise storage is. I mean Terremark charges $0.25/GB per month(on SATA powered 3PAR arrays), Rackspace charges $0.15/GB per month(I believe Rackspace just uses DAS). I kind of would of expected the enterprise storage route to cost say 3-5x more, not less than 2x. When I was doing real enterprise cloud pricing storage for the solution I was looking for typically came in at 10-20% of the total cost, with 80%+ of the cost being CPU+memory. For me it's a no brainier - I'd rather pay a bit more and have my storage on a 3PAR of course (when dealing with VM-based storage not bulk archival storage). With the average cost of my storage for 3.6TB over 2 years coming in at $0.06/GB it makes more sense to just do it myself.
I just hope my new server holds up, my last one lasted a long time, so I sort of expect this one to last a while too, it got burned in before I started using it and the load on the box is minimal, would not be too surprised if I can get 5 years out of it - how big will HDDs be in 5 years?
I will miss Terremark because of the reliability and availability features they offer, they have a great service, and now of course are owned by Verizon. I don't need to worry about upgrading vSphere any time soon as there's no reason to go to vSphere 5. The one thing I have been contemplating is whether or not to put my vSphere management interface behind the OpenBSD firewall(which is a VM of course on the same box). Kind of makes me miss the days of ESX 3, when it had a built in firewall.
I'm probably going to have to upgrade my cable internet at home, right now I only have 1Mbps upload which is fine for most things but if I'm doing off site backups too I need more performance. I can go as high as 5Mbps with a more costly plan. 50Meg down 5 meg up for about $125, but I might as well go all in and get 100meg down 5 meg up for $150, both plans have a 500GB cap with $0.25/GB charge for going over. Seems reasonable. I certainly don't need that much downstream bandwidth(not even 50Mbps I'd be fine with 10Mbps), but really do need as much upstream as I can get. Another option could be driving a USB stick to the co-lo, which is about 35 miles away, I suppose that is a possibility but kind of a PITA still given the distance, though if I got one of those 128G+ flash drives it could be worth it. I've never tried hooking up USB storage to an ESX VM before, assuming it works? hmmmm..
Another option I have is AT&T Uverse, which I've read good and bad things about - but looking at their site their service is slower than what I can get through my local cable company (which truly is local, they only serve the city I am in). Another reason I didn't go with Uverse for TV is due to the technology they are using I suspected it is not compatible with my Tivo (with cable cards). Though AT&T doesn't mention their upstream speeds specifically I'll contact them and try to figure that out.
I kept the motherboard/cpus/ram from my old server, my current plan is to mount it to a piece of wood and hang it on the wall as some sort of art. It has lots of colors and little things to look at, I think it looks cool at least. I'm no handyman so hopefully I can make it work. I was honestly shocked how heavy the copper(I assume) heatsinks were, wow, felt like 1.5 pounds a piece, massive.
While my old server is horribly obsolete, one thing it does have even on my new server is being able to support more ram. Old server could go up to 24GB(I had a max of 6GB at the time in it), new server tops out at 8GB (have 8GB in it). Not a big deal as I don't need 24GB for my personal stuff but just thought it was kind of an interesting comparison.
This blog has been running on the new server for a couple of weeks now. One of these days I need to hook up some log analysis stuff to see how many dozen hits I get a month.
If Terremark could fix three areas of their vCloud express service - one being resource pool-based billing, another being relaxing the costs behind opening multiple ports in the firewall (or just giving 1:1 NAT as an option), and the last one being thin provisioning friendly billing for storage -- it would really be a much more awesome service than it already is.
Sorry for my three readers out there for not posting recently I've been pretty busy! And to me there hasn't been too much events in the tech world in the past month or so that have gotten me interested enough to write about them.
I was talking with a friend of mine recently he was thinking about either throwing a 1U server in a local co-location or play around with one of the cloud service providers. Since I am doing both still (been too lazy to completely move out of the co-lo...) I gave him my own thoughts, and it sort of made me think about more about the cloud in general.
What do I expect from a cloud?
When I'm talking cloud I'm mainly referring to the IaaS or Infrastructure as a Service. Setting aside cost modelling and stuff for a moment here I expect the IaaS to more or less just work. I don't want to have to care about:
- Power supply failure
- Server failure
- Disk drive failure
- Disk controller failure
- Scheduled maintenance (e.g. host server upgrades either software or hardware, or fixes etc)
- Network failure
- UPS failure
- Generator failure
- Dare I say it ? A fire in the data center?
- And I absolutely want to be able to run what ever operating system I want, and manage it the same way I would manage it if it was sitting on a table in my room or office. That means boot from an ISO image and install like I would anything else.
Hosting it yourself
I've been running my own servers for my own personal use since the mid 90s. I like the level of control it gives me and the amount of flexibility I have with running my own stuff. Also gives me a playground on the internet where I can do things. After multiple power outages over the first part of the decade, one of which lasted 28 hours, and the acquisition of my DSL provider for the ~5th time, I decided to go co-lo. I already had a server and I put it in a local, Tier 2 or Tier 3 data center. I could not find a local Tier 4 data center that would lease me 1U of space. So I lacked:
- Redundant Power
- Redundant Cooling
- Redundant Network
- Redundant Servers (if my server chokes hard I'm looking at days to a week+ of downtime here)
For the most part I guess I had been lucky, the facility had one, maybe two outages since I moved in about three years ago. The bigger issue with my server was aging and the disks were failing, it was a pain to replace them and it wasn't going to be cheap to replace the system with something modern and capable of running ESXi in a supported configuration(my estimates put the cost at a minimum of $4k). Add to that the fact that I need such a tiny amount of server resources.
Doing it right
So I had heard of Terremark from my friends over at 3PAR, and you know I like 3PAR, and they use Vmware and I like Vmware. So I decided to go with them rather than the other providers out there, they had a decent user interface and I got up and going fairly quickly.
So I've been running it for almost a year, with pretty much no issues, I wish they had a bit more flexibility in the way they provision networking stuff but nothing is perfect (well unless you have the ability to do it yourself).
From a design perspective, Terremark has done it right, whether it's providing an easy to use interface to provision systems, using advanced technology such as VMware, 3PAR, and Netscaler load balancers, and building their data centers to be even -- fire proof.
Having the ability to do things like Vmotion, or Storage vMotion is just absolutely critical for a service provider, I can't imagine anyone being able to run a cloud without such functionality at least with a diverse set of customers. Having things like 3PAR's persistent cache is critical as well to keep performance up in the event of planned or unplanned downtime in the storage controllers.
I look forward to the day where the level of instrumentation and reporting in the hypervisors allow billing based on actual usage, rather than what is being provisioned up front.
In case your a less technical user I wanted to outline a few of the abilities the technology Terremark uses offers their customers -
Memory Chip Failure (or any server component failure or change)
Most modern servers have sensors on them and for the most part are able to accurately predict when a memory chip is behaving badly and to warn the operator of the machine to replace it. But unless your running on some very high end specialized equipment (which I assume Terremark is not because it would cost too much for their customers to bare), the operator needs to take the system off line in order to replace the bad hardware. So what do they do? They tell VMware to move all of the customer virtual machines off the affected server onto other servers, this is done without customer impact, the customer never knows this is going on. The operator can then take the machine off line and replace the faulty components and then reverse the process.
Same applies to if you need to:
- Perform firmware or BIOS updates/changes
- Perform Hypervisor updates/patches
- Maybe your retiring an older type of server and moving to a more modern system
This one is pretty simple, a disk fails in the storage system and the vendor is dispatched to replace it, usually within four hours. But they may opt to wait a longer period of time for whatever reason, with 3PAR it doesn't really matter, there are no dedicated hot spares so your really in no danger of losing redundancy, the system rebuilds quickly using a many:many RAID relationship, and is fully redundant once again in a matter of hours(vs days with older systems and whole-disk-based RAID).
Storage controller software upgrade
There are fairly routine software upgrades on modern storage systems, the software feature set seems to just grow and grow. So the ability to perform the upgrade without disrupting users for too long(maybe a few seconds) is really important with a diverse set of customers, because there will probably be no good time where all customers say ok I have have some downtime. So having high availability storage with the ability to maintain performance with a controller being off line by mirroring the cache elsewhere is a very useful feature to have.
Storage system upgrade (add capacity)
Being able to add capacity without disruption and dynamically re-distribute all existing user data across all new as well as current disk resources on-line to maximize performance is a boon for customers as well.
UPS failure (or power strip/PDU failure)
Unlike the small dinky UPS you may have in your house or office UPSs in data centers typically are powering up to several hundred machines, so if it fails then you may be in for some trouble. But with redundant power you have little to worry about, the other power supply takes over without interruption.
If a server power supply blows up it has the ability to take out the entire branch or even whole circuit that it's connected to. But once again redundant power saves the day.
Uh-oh I screwed up the network configuration!
Well now you've done it, you hosed the network (or maybe for some reason your system just dropped off the network maybe flakey network driver or something) and you can't connect to your system via SSH or RDP or whatever you were using. Fear not, establish a VPN to the Terremark servers and you can get console access to your system. If only the console worked from Firefox on Linux..can't have everything I guess. Maybe they will introduce support for vSphere 4.1's virtual serial concentrators soon.
It just works
There are some applications out there that don't need the level of reliability that the infrastructure Terremark uses can provide and they prefer to distribute things over many machines or many data centers or something, that's fine too, but most apps, almost all apps in fact make the same common assumption, perhaps you can call it the lazy assumption - they assume that it will just work. Which shouldn't surprise many, because achieving that level of reliability at the application layer alone is an incredibly complex task to pull off. So instead you have multiple layers of reliability under the application handling a subset of availability, layers that have been evolving for years or decades even in some cases.
Terremark just works. I'm sure there are other cloud service providers out there that work too, I haven't used them all by any stretch(nor am I seeking them for that matter).
Public clouds make sense, as I've talked about in the past for a subset of functionality, they have a very long ways to go in order to replace what you can build yourself in a private cloud (assuming anyone ever gets there). For my own use case, this solution works.