Compellent beats expectations
Earlier in the year Compellent's stock price took a big hit following lower expectations for sales and stuff, a bunch of legal stuff followed that, it seems yesterday they redeemed themselves though with their stock going up nearly 33% after they tripled their profits or something.
I've had my eye on Compellent for a couple of years now, don't remember where I first heard about them. They have similar technology to 3PAR, just it's implemented entirely in software using Intel CPUs as far as I know vs 3PAR leveraging ASICs (3PAR has Intel CPUs too but they aren't used for too much).
I have heard field reports that because of this that their performance is much more on the lower end of things, they have never published a SPC-1 result and I don't know anyone that uses them so don't know how they really perform.
They seem to use the same Xyratex enclosures that most everyone else uses. Compellent's controllers do seem to be somewhat on the low end of things, I really have nothing other to go on other than cache. With their high end controller coming in at only 3.5GB of cache (I assume 7GB mirrored for a pair of controllers?) it is very light on cache. The high end has a dual core 3.0Ghz CPU.
The lower amount of cache combined with their software-only design and only two CPUs per controller and the complex automated data movement make me think the systems are built for the lower end and not as scalable, but I'm sure perfectly useful for the market they are in.
Would be nice to see how/if their software can scale if they were to put say a pair of 8 or 12 core CPUs in their controllers. After all since they are leveraging x86 technology performing such an upgrade should be pretty painless! Their controller specs have remained the same for a while now(as far back as I can remember). The bigger CPUs will use more power, but from a storage perspective I'm happy to give a few hundred more watts if I can get 5x+ the performance, don't have to think once, yet alone twice.
They were, I believe the first to have automagic storage tiering and for that they deserve big props, though again no performance numbers posted (that I am aware of) that can illustrate the benefits this technology can bring to the table. I mean if anybody can prove this strategy works it should be them right? On paper it certainly sounds really nice but in practice I don't know, haven't seen indications that it's as ready as the marketing makes it out to be.
My biggest issue with automagic storage tiering is how fast the array can respond to "hot" blocks and optimize itself, which is why I think from a conceptual perspective I really like the EMC Fast Cache approach more (they do have FAST LUN and sub LUN tiering too). Not that I have any interest in using EMC stuff but they do have cool bits here and there.
Maybe Compellent the next to get bought out (as a block storage company yeah I know they have their zNAS), I believe from a technology standpoint they are in a stronger position than the likes of Pillar or Xiotech.
Anyway that's my random thought of the day
Amazon freebies
Two part post. Firstly, I would like to brief the community of a TechOpsGuys Public Service Announcement and inform you (if you haven't heard) that Amazon is offering free cloud services starting November 1. I know, old news (two days) but the James Hamilton blog has an excellent reader's digest version of the announcement here. Yes, lol his picture.. the guy is a damn rockstar with that dew but I encourage you to follow his blog and more formal literature which is often released after major industry expos. You will thank me after spending a few hours looking over his brilliant body of work.
Even more Amazon details can be found here.
Second, I will definitely be playing with each and every services API using some of the excellent projects over at the githubs. If the frameworks around Amazon aren't drop dead simple or elegantly written I'll probably write my own and post it to github so stay tuned if you're interested. Of interest to me initially is testing some automated app deployment tools like chef, fabric, vlad, and some creative special sauce just for fun. I typically only write backend data integration type stuff since I function as a sysadmin all day but I'm excited about writing custom apps (maybe even frontend) and exercising some creativity. I'll be sure to keep you all informed and in the meantime check out the tools mentioned above and comment about some that you prefer.
One last thing, if this topic interests you head on over to the devops toolchain and poke around for awhile. Maybe even signup for the mail list and share some common interests with like minded individuals. I subscribed to the digest and enjoy its light reading.
Rust in peace
So one of of the former TechOpsGuys Tycen emailed our little team of people yesterday mentioning that the company where we all met (Dave, Jason, Nate & Tycen) is now dead and buried according to an article on TechFlash. We were all on the same operations team in case it wasn't obvious.
Recruiting.com, the Seattle online recruiting startup formerly known as Jobster, has quietly been sold in a deal of unknown size to Phoenix, Arizona-based Jobing, TechFlash has learned. The acquisition, which closed in July, marks the final chapter for one of Seattle's most heavily funded Internet companies of the past decade. Founded in 2004 by Internet entrepreneur Jason Goldberg, Recruiting.com raised $55 million from a slate of investors that included Ignition Partners, Reed Elsevier Ventures, Trinity Ventures, Mayfield Fund and others.
Where did the money go? I don't know, we were really efficient in IT, and I think development was pretty good, so I can only think that it was the less technical areas of the company that spent most of the cash. $55 million doesn't seem like enough to say "one of the most heavily funded". Doesn't seem like much at all. I know it was a different era but I remember at Freeinternet.com (headquartered in Federal Way, near Seattle) they were spending $7 million/month on bandwidth they weren't even using (they didn't know they had it). That is just one of the little factoids.
The strategies the company had while I was there were typical? Nothing short of their original strategy made any sense to me and unfortunately for them they kept pulling resources out of that original strategy (the one generating the majority of the revenue but it wasn't shiny). We knew we were in trouble when we were denied the right to remove email addresses from our databases that were bouncing back causing us to get blacklisted on mail servers due to excessive bouncing. It would hurt our user count. Well those users don't exist anymore!!
Jobster certainly provided me with several opportunities to round out some of my skills and experience. I had learned a lot prior to that but often worked with dedicated people whether it was networking people, storage people etc. So while I knew what I was doing it was nice to be able to build things up from scratch. Did a data center move while I was there, moving out of a facility that was plagued with power outages in cabinets spread out all over the place to a better facility in a consolidated cage. Got a lot more hands on with our Oracle Databases as well, at previous companies we had dedicated Oracle folks and they pretty much ran everything. Speaking of Oracle even transitioned the company from Enterprise Edition to Standard Edition which of course saved a ton of greenbacks.
I went looking for my stock certificates I bought a few shares after I left, the price was cheap (~$0.15) just in case. For some reason I could not find them, maybe they didn't send them to me I don't remember it was a while ago. Not that I think the stock has any value but wanted to post a picture of what they looked like for some color
It was an interesting place to work for at the time, I joined shortly after they launched their main product and it was pretty fast downhill from there. What can we say, we built the system to scale to the growth expectations that they had and that growth never materialized.
At the peak we had almost three racks of production equipment (15kW active 15kW backup power), in my mac daddy 47U 28" wide 49" deep Rittal TS-8 racks. Those racks are bad ass I gotta say. I could only fit three of them in a five rack cage (but due to power we had more than enough space in three). The cable management ROCKED (each server had at least 5 cables, which adds up fast). The extra 4" of width made all the difference. The Servertech PDUs were great too, loved the integrated environmental sensors, rear door fans to exhaust the hot air away from the equipment. I love big racks! (couldn't resist)
The different facility was really nice as well the only downside is it doesn't support high density stuff. I was hosted at the same place at the company before that and due to power constraints we had to have 8 feet of space in between our rows of racks. Not that I minded too much 8 feet was a lot of area to stretch out, put tables and shelves in for storage etc. With ~30 foot ceilings it's one of the few data centers I've been to that didn't make me feel claustrophobic.
My strategy for the company at the time was to make their main product as good as possible and then partner with someone like linkedin to leverage their network with Jobster's technology. Jobster's main product was never visible to the public it was a subscription only product, so most probably never saw the technology that the company was built on (initially at least). Their strategy was OOH, SHINY! rinse and repeat (at least 3 times while I was there).
Inhale… PKI and XMLRPC… exhale
I've been working a project the last few days to automate the handling of creation, revocation, and all around management of a PKI. Look if I want to keep a team of two guys in charge of hundreds and even thousands of engineers, multiple sites, terabytes of data, private cloud, and continued explosive growth then automation is key.
Anyhow, I'm finally finished and very pleased with the end result. I believe it to be a fair balance between security, functionality, and extensibility. Without giving away all of the keys to the kingdom the architecture is as follows (for creation):
- Account provisioning automation tier submits authenticated XMLRPC request to the isolated CA VM over SSL with the common name as part of the message.
- CA creates the key and encrypts it with a random password, generates the CSR, signs the CSR, generates the encrypted PKCS12 file, and publishes the certs to the applicable LDAP object attributes.
- XMLRPC instance responds back with (N)ACK
- XMLRPC instance logs the entire lifecycle of key management consistently
The corollary of course (revocation):
- Account provisioning automation tier submits authenticated XMLRPC request to the isolated CA VM over SSL with the common name as part of the message.
- CA revokes the certificate subsequently updating the CRL
- The CRL gets pushed to a git repository with all of the other configs; published with configuration management utilities (think cfengine, puppet, or chef)
- The CRL finally makes its way to the systems that matter via said config management software and dependent services are bounced.
What's important to note is the fact that I use a very minimal XMLRPC interface to the PKI. No administrators logging in and rooting around doing manual one offs with the PKI. Every interaction is now forced to be consistent, audited, and in my opinion secure. Even better the certificates are signed according to the purpose of creation (obviously following RFC3280 as closely as possible). More specifically, if a certificate is for a user then the proper extension attributes are populated and likewise for server-side components. This further ensures compliance with the PKI and TLS related standards.
As far as logging, I made sure to keep logging very consistent with the rest of the framework that the PKI component is now just another plugin of. Basically, this means a tuple of (boolean, 'msg') which allows for clean and easy flow control. An example is:
c = ssl.PKI()
log('Adding new user to PKI')
rc, msg = c.NewUser(params)
if rc:
log('Successfully added new user to PKI')
<do something>
else:
return (rc, msg)
That code snippet is 100% made up on the spot but you see that having consistent logging and return codes is quite valuable and frankly easy to read.
Ok, so that's cool but I would be lying if I said I got it right the first time. This project went through a few different POC's including reliance upon M2Crypto, python-openssl, a mix of the two, libPKI (C), some mixed up crap out of github, and even a wrapper around the openssl command line utility. The cat's not out but suffice to say I ultimately went with what I thought was the most most feature rich and elegant solution.
Such work coming from the IT group is what I think of when I read about Next Practice or with a more modern spin Infrastructure as Code. It's what I expect out of a certain elite few on my team and if your a big enterprise without such individuals then you either don't know what you are missing or are actively trying to poach the individual out of some other nervous bastards clinched grip. The individual is for damn sure employed and you better have a nice package or some exciting work like self-driving cars to steal her.
Open Source clouds
Not sure how I missed ElasticHosts (maybe because they are UK only right now) but they're a seemingly viable public cloud based on KVM. Too cool and the only one of its kind AFAIK (please comment if you are aware of others). KVM is an awesome project and I've been following the development mailing list for quite a while so I'd like to think I know just how cool it is. For those of you who claim it isn't "enterprise class" or even the more ridiculous claim that it "... isn't a type 1 hypervisor" get a life and learn what a kernel module is. The HCL is far more superior to any other hypervisor not to mention the code contributions are coming from quite highly respected engineers from a number of different companies and ISV's.
Still, the technology mentioned serves as merely a tangent in comparison to this weeks news and releases in the Open Source cloud space. The folks over at OpenStack pushed out a development release of the "compute" variety and production release of the "storage" tier. Fantastic indeed but I am more excited about the OpenNebula project and the major release just announced. The feature set of OpenNebula is unbelievable and growing (see the link). I currently manage a vSphere cluster and when comparing the vCloud Director to some of these Open Source alternatives I'm leaning (nay running) toward the more freely available, extensible, and easier to automate solution for a private cloud deployment. Sure if all you know how to do is click Next -> Next -> Finish then OpenNebula and the like will scare you but if you really want to unlock the feature sets then vCloud Director (Oracle, .Net, and 64 bit windows requirements) feels more like IE6 in today's browser wars.
IPv4 address space exhaustion – tired
Just saw YASOSAIV6 (Yet another story on Slashdot about IPv6)..
They've been saying it for years, maybe even a decade? That we are running out of IPs and we need to move to IPv6. It's taken forever for various software and hardware manufacturers to get IPv6 into their stacks, and even now most of them haven't seen much real world testing. IPv6 is of course a chicken and egg problem.
My take on it, from a technological standpoint I do not look forward to IPv6, not at all. Really for one simple reason - IPv4 IP addresses are fairly simple to remember, and simpler to recognize. IPv6 - forget about it. I'm a simple minded person and that is a simple reason I don't look forward to IPv6.
I don't have a problem with Network Address Translation (NAT), it's amazing to me how many people out there absolutely despise NAT, I won't spend much time talking about why I think NAT is a good thing because I have better things to spend my time on
[And yes when I'm not using NAT I absolutely run my firewalls in bridging mode again for simplicity purposes]
I don't believe we have an IPv4 crisis yet, sure IANA or whoever is the organization that assigns IP addresses says we are low on that free pool but guess what, service providers around the world have gobs of unused IPs. I talk to service providers fairly often and none of them are concerned about it, they do want you to be smart about IP allocation however. I suppose if your some big company and want to get 5,000 IP addresses you may need to be concerned, but for smaller organizations who may need a dozen or two dozen IPs at the most - really nothing to worry about.
One thing I think could free up a bunch of IPs and allow IPv4 to scale even further is to somehow fix the SSL/TLS/HTTPS protocol(s) so that it can support virtual hosts (short of using wild card certs). I'm sure it's possible but it won't be easy to get the software out to the field to all the various edge devices in order to be able to support it. One company I worked at needed about a hundred IPs JUST for SSL (wild card certs were not an option at the time due to lack of client side support).
I know we'll get to IPv6 eventually, and I'll accept that when we get there, though it may be far enough out that I don't deal with lower level stuff anymore so won't need to be concerned about it, I don't know.
Red Hat wants to end “IT Suckage”
Read an interesting article over on The Register with a lot of comments by a Red Hat executive.
And I can't help but disagree on a bunch of stuff the executive says. But it could be because the executive is looking at and talking with big bloated slow moving organizations that have a lot of incompetent people in their ranks ("Never got fired for buying X" mantra), instead of smaller more nimble more leading edge organizations willing, ready and able to take some additional "risk" for a much bigger return (such as running virtualized production systems, seems like a common concept to many but I know there's a bunch of people out there that aren't convinced that it will work, btw I ran my first VMware in production in 2004, and saved my company BIG BUCKS with the customer (that's a long story, and an even longer weekend)).
OK so this executive says
After all, processor and storage capacity keep tracking along on their respective Moore's and Kryder's Laws, doubling every 18 months, and Gilder's Law says that networking capacity should double every six months. Those efficiencies should lead to comparable economies. But they're not.
I was just thinking this morning about the price and capacity of the latest systems(sorry keep going back to the BL685c G7 with 48 cores and 512GB of ram
).
I remember back in 2004/2005 time frame the company I was at paying well over $100,000 for a 8-way Itanium system with 128GB of memory to run Oracle databases. The systems of today whether it is the aforementioned blade or countless others can run circles around such hardware now at a tiny fraction of the price. It wasn't unreasonable just a few short years ago to pay more than $1M for a system that had 512GB of memory and 24-48 CPUs, and now you can get it for less than $50,000(in this case using HP web pricing). That big $1M system probably consumed at least 5-10kW of power and a full rack as well, vs now the same capacity can go for ~800W(100% load off the top of my head) and you can get at least 32 of them in a rack(barring power/cooling constraints).
Granted that big $1M system was far more redundant and available than the small blade or rack mount server, but at the time if you wanted so many CPU cores and memory in a single system you really had no choice but to go big, really big. And if I was paying $1M for a system I'd want it to be highly redundant anyways!
With networking, well 10GbE has gotten to be dirt cheap, just think back a few years ago if you wanted a switch with 48 x 10GbE ports you'd be looking at I'd say $300k+ and it'd take the better part of a rack. Now you can get such switches in a 1U form factor from some vendors(2U from others), for sub $40k?
With storage, well spinning rust hasn't evolved all that much over the past decade for performance unfortunately but technologies like distributed RAID have managed to extract an enormous amount of untapped capacity out of the spindles that older architectures are simply unable to exploit. More recently the introduction of SSDs and the sub LUN automagic storage tiering technology that is emerging (I think it's still a few years away from being really useful) you can really get a lot more bang out of your system. EMC's fast cache looks very cool too from a conceptual perspective at least I've never used it and don't know anyone who has but I do wish 3PAR had it! Assuming I understand the technology right, with the key being the SSDs are used for both read and write caching. Verses something like the NetApp PAM card which is only a read cache. Neither Fast cache nor PAM is enough to make we want to use those platforms for my own stuff.
The exec goes on to say
Simply put, Whitehurst's answer to his own question is that IT vendors suck, and that the old model of delivering products to customers is fundamentally broken.
I would tend to agree for the most part but there are those out there that really are awesome. I was lucky enough to find one such vendor, and a few such manufacturers. As one vendor I deal with says they work with the customer not with the manufacturer, they work to give the customer what is best for them. So many vendors I have dealt with over the years are really lazy when it comes down to it, they only know a few select solutions from a few big name organizations and give blank stares if you go outside their realm of comfort (random thought: I got the image of Speed Bump: The roadkill possum from a really old TV series called Liquid Television that I watched on MTV for a brief time in the 90s).
By the same token while most IT vendors suck, most IT managers suck too, for the same reason. Probably because most people suck that may be what it comes down to it at the end of the day. IT as you well know is still an emerging industry, still a baby really evolving very quickly, but has a ways to go. So like with anything the people out there that can best leverage IT are few and far between. Most of the rest are clueless -- like my first CEO about 10-11 years ago was convinced he could replace me with a tech head from Fry's Electronics (despite my 3 managers telling him he could not). About a year after I left the company he did in fact hire such a person -- only problem was that individual never showed up for work (maybe he forgot).
Exec goes on to say..
"Functionality should be exploding and costs should be plummeting — and being a CIO, you should be a rock star and out on the golf course by 3 pm," quipped Whitehurst to his Interop audience.
That is in fact what is happening -- provided your choosing the right solutions, and have the right people to manage them, the possibilities are there, just most people don't realize it or don't have the capacity to evolve into what could be called the next generation of IT, they have been doing the same thing for so long, it's hard to change.
Speaking of being a rock star and out on the golf course by 3pm, I recall two things I've heard in the past year or so-
The first one used the golf course analogy, from a local VMware consulting shop that has a bunch of smart folks working for them I thought this was a really funny strategy and can see it working quite well in many cases - the person took an industry average of say 2-3 days to provision a new physical system, and said in the virtual world -- don't tell your customers that you can provision that new system in ten minutes, tell them it will take you 2-3 days, spend the ten minutes doing what you need and spend the rest of the time on the golf course.
The second one was from a 3PAR user I believe. Who told one of their internal customers/co-workers something along the lines of "You know how I tell you it takes me a day to provision your 10TB of storage? Well I lied, it only takes me about a minute".
For me, I'm really too honest I think, I tell people how long I think it will really take and at least on big projects am often too optimistic on time lines. Maybe I should take up Scotty's strategy and take my time lines and multiply them by four to look like a miracle worker when it gets done early. It might help to work with a project manager as well, I haven't had one for any IT projects in more than five years now. They know how to manage time (if you have a good one, especially one experienced with IT not just a generic PM).
Lastly the exec says
The key to unlocking the value of clouds is open standards for cloud interoperability, says Whitehurst, as well as standardization up and down the stack to simplify how applications are deployed. Red Hat's research calculates that about two-thirds of a programmer's time is spent worrying about how the program will be deployed rather than on the actual coding of the program.
Worrying about how the program will be deployed is a good thing, an absolutely good thing. Rewinding again to 2004 I remember a company meeting where one of the heads of the company stood up and said something along the lines of 2004 was the year of operations, we worked hard to improve how the product operates, and the next phase is going back to feature work for customers. I couldn't believe my ears, that year was the worst for operations, filled with half implemented software solutions that actually made things worse instead of better, outages increased, stress increased, turnover increased.
The only thing I could do from an operations perspective and buy a crap load of hardware and partition the application to make it easier to manage. We ended up with tons of excess capacity but the development teams were obviouslly unable to make the design changes we needed to improve the operations of the application, but we at least had something that was more manageable, the deployment and troubleshooting teams were so happy when the new stuff was put into production, no longer did they have to try to parse gigabyte sized log files trying to find which errors belong to which transactions from which subsystem. Traffic for different subsystems was routed to different physical systems so you knew if there was an issue with one type of process you go to server farm X to look at it, problem resolution was significantly faster.
I remember having one conversation with a software architect in early 2005 about a particular subsystem that was very poorly implemented (or maybe even designed), it caused us massive headaches in operations, non stop problems really. His response was Well I invited you to a architecture meeting in January of 2004 to talk about this but you never showed up. I don't remember the invite but if I saw it I know why I didn't show up, it's because I was buried in production outages 24/7 and had no time to think more than 24 hours ahead yet alone think about a software feature that was months away from deployment. Just didn't have the capacity, was running on fumes for more than a year.
So yes, if you are a developer please do worry about how it is deployed, never stop worrying. Consult your operations team (assuming they are worth anything), and hopefully you can get a solid solution out the door. If you have a good experienced operations team then it's very likely they know a lot more about running production than you do and can provide some good insight into what would provide the best performance and uptime from an operations perspective. They may be simple changes, or not.
One such example, I was working at a now defunct company who had a hard on for Ruby on Rails. They were developing app after app on this shiny new platform. They were seemingly trying to follow Services Oriented Architecture (SOA), something I learned about ironically at a Red Hat conference a few years ago (didn't know there was a acronym for that sort of thing it seemed so obvious). I had a couple, really simple suggestions for them to take into account for how we would deploy these new apps. Their original intentions called for basically everything running under a single apache instance(across multiple systems), and for example if Service A wanted to talk to Service B then it would talk to that service on the same server. My suggestions which we went with involved two simple concepts:
- Each application had it's own apache instance, listening on it's own port
- Each application lived behind a load balancer virtual IP with associated health checking, with all application-to-application communication flowing through the load balancer
Towards the end we had upwards of I'd say 15 of these apps running on a small collection of servers.
The benefits are pretty obvious, but the developers weren't versed in operations -- which is totally fine they don't need to be (though it can be great when they are, I've worked with a few such people though they are VERY RARE) that's what operations people do and you should involve them in your development process.
As for cloud standards -- folks are busy building those as we speak and type. VMware seems to be the furthest along from an infrastructure cloud perspective I believe, I wouldn't expect them to lose their leadership position anytime soon they have an enormous amount of momentum behind them, and it takes a lot to counter that momentum.
About a year ago I was talking to some former co-workers who told me another funny story they were launching a new version of software to production, the software had been crashing their test environments daily for about a month. They had a go no-go meeting in which everyone involved with the product said NO GO. But management overrode them, and they deployed it anyways. The result? A roughly 14 hour production outage while they tried to roll the software back. I laughed and said, things really haven't changed since I left have they?
So the solutions are there, the software companies and hardware companies have been evolving their stuff for years, the problem is the concepts can become fairly complex when talking about things like capacity utilization and stranded resources, getting the right people in place to be able to not only find such solutions but deploy and manage them as well can really go a long ways, but those people are rare at this point.
I haven't been writing too much recently been really busy, Scott looks to be doing a good job so far though.
Runt post
I always liked how the guys over at the PacketPushers podcast describe the short podcasts as "Runts". If you're a networking junkie the reference is obvious. Having said that, this is a a short and unexpected post on something that just came across my Google reader.
If you have been following the industry buzz around DevOps, Agile software development, and continuous deployment then you must respect the guys over at Wealthfront (formally kaChing). No need for this post to be any longer, just read this. Wow.
Sunday reading
I was enjoying a beverage and reading an interesting LWN.net article on the topic of the Linux block layer and SSD's. Most interesting to me were the references to lessons learned in the networking stack. Quite rightly in my opinion. If you've been a Linux admin with an eye on networking since about the 2.6.9 release (guessing but sounds about right) then you are already familiar with technologies like:
Obviously, these were needed to keep up with increasing rate of PPS and throughput requirements of +1Gb. Hell, one of the most advertised features of NAPI is polling under pressure instead of 100% interrupt driven. These enhancements are now being dusted off and reviewed once more but for inclusion in a completely different subsystem.
The article was illuminating to me since I had not been aware of the technical issues in the block layer. I wasn't blind-sided by the news since it's obvious to any technologist that a jump from a few hundred IOPS per device to several tens of thousands or hundreds of thousands is going to identify inefficiencies; not to mention the block subsystem has been designed to avoid drive seeks which is largely overhead when dealing with SSD's. It's seems the bottleneck is to remove locking in many locations, better handle SMP systems, and figure out how in the world to best handle crappy controllers. For instance, newer NIC's come with multiple RX/TX queues so efficient use of SMP systems can be hardware offloaded (especially when the address / port tuple is hashed in hardware) but AFAIK this is not the case in storage controllers. It will be interesting to see how much the improvements to the block layer mirror those of the networking subsystem.
I'm still getting used to blogging and can't quite tell if these posts are too long or too short so hang with me as I find my pace
. I'll have another one here shortly about how cool the Mantis bugtracker is and how I extended the built-in SOAP API and made use of it to automate interactions. Very exciting.
Open Source Innovation
I've been doing a bit of window shopping in preparation for a private cloud implementation next year and am very excited about the maturity, speed, and awesomeness of the open source community. Awesomeness of course means API's and python / ruby bindings for a number of these projects or in the case of the openstack nova project it's largely written in python. Many of the technologies used with openstack ( rabbitmq, nginx, and redis) scale quite nicely and are used in production at some very large infrastructures for which an IT guy can find some very cool tech talks on. These tools and crew are the way of the future and IT should really get familiar with them and comfortable extending with some special sauce which unlocks the true value-add in your vertical.
Technologies like ceph and sheepdog look equally cool on the file system side of the house. Although still pretty green, this video highlights some of the features of sheepdog which could become quite cool indeed. 4Mb chunks distributed across however many commodity servers sans meta data server / bottleneck. Now, don't be silly and think I'm serious about putting something like sheepdog or ceph in production today but the technology is moving fast and it's absolutely worth following.. especially since ceph was included upstream in 2.6.34 and sheepdog is currently usable by qemu (which implies KVM). Not to mention, sheepdog in documentation appears to be incredibly simple to understand, manage, and operate. Quite a contrast to its competition. I've been a victim of Redhat's GFS in the past and toyed with other distributed file systems like VMFS (before going directly to NFS and collecting $200) and have decided that they are far too complicated and awful for me.
Simplicity is the name of the game now for an infrastructure guy like myself. I'm done hyper analyzing every minute detail of the stack and want to accomplish the task at hand with significantly less ramp up time, cost, and complexity. I need a licensing model that scales (GPL, BSD, Apache, etc) and a product that is extensible. Your engineering and finance department will thank you!
