Saw! an! article! today! and! thought! of! a! somewhat! sad! situation,! at! least! for! those! at! Yahoo!
Not long ago, Google announced they would be giving every employee in the company a 10% raise starting January 2011. One super bad ass engineer is apparently going to get a $3.5M retention bonus to not go to the competition. Lucky for him perhaps that Google is based in California and non competes are not enforceable in California.
Now Yahoo! has announced somewhat of the opposite, no raises, in fact they are going to give the axe to 10% of their employees.
It's too bad that Yahoo! lost it's way so long ago. There was a really good blog post about what went wrong with Yahoo! Going back more than a decade, really interesting insight into the company.
"The TPC-E results suggest a promising direction for future investigation. We chose an architecture that scales linearly over many orders of magnitude on commodity machines, but we’ve seen that this costs a significant 30-fold overhead compared to traditional database architectures.
Kind of makes you think... I guess if your operating at the scale they are, the overhead is not a big deal, they'll probably a find a way to reduce(ha ha, map reduce, get it? sorry) it over time.
From the group of people that brought the Map Reduce algorithm to a much broader audience (despite the concepts being decades old), Google has now outgrown it and is moving on according to our friends at The Register.
The main reason behind it is map reduce was hindering their ability to provide near real time updates to their index. So they migrated their Search infrastructure to a Bigtable distributed database. They also optimized the next generation Google file system for this database, making it inappropriate for more general uses.
MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."
I have to wonder how much this new distributed database-based index was responsible for Google to be able to absorb upwards of a 7 fold increase in search traffic due to the Google Instant feature being launched.
I had an interview at a company a couple of months ago that was trying to use Hadoop + Map Reduce for near real-time operations (the product had not launched yet), and thought that wasn't a very good use of the technology. It's a batch processing system. Google of course realized this and ditched it when it could no longer scale to the levels of performance that they needed (despite having an estimated 1.8 million servers at their disposal).
As more things get closer to real time I can't help but wonder about all those other companies out there that have hopped on the Hadoop/Map Reduce bandwagon, when they will realize this and try once again to follow the food crumbs that Google is dropping.
I just hope for those organizations, that they don't compete with Google in any way, because they will be at a severe disadvantage from a few angles:
- Google has a near infinite amount of developer resources internally and as one Yahoo! person said "[Google] clearly demonstrated that it still has the organizational courage to challenge its own preconceptions,"
- Google has a near infinite hardware capacity and economies of scale. What one company may pay $3-5,000 for, Google probably pays less than $1,000. They are the largest single organization that uses computers in the world. They are known for getting special CPUs,. and everyone at cloud scale operates with specialized motherboard designs. They build their own switches and routers (maybe). Though last I heard they are still a massive user of Citrix Netscaler load balancers.
- Google of course operates it's own high end, power efficient data centers which means they get many more servers per kW than you can get in a typical data center. I wrote earlier in the year about a new container that supports 45 kilowatts per rack, more than ten times your average data center.
- Google is the world's third largest internet carrier and due to peering agreements pays almost nothing for bandwidth.
Google will be releasing more information about their new system soon, I can already see the army of minions out there gearing up to try to duplicate the work and try to remain competitive. ha ha! I wouldn't want to be them, that's all I can say
[Google's] Lipokovitz stresses that he is "not claiming that the rest of the world is behind us."
Got to admire the modesty!
The report also says that 60 per cent of Google's traffic is now delivered directly to consumer networks. In addition to building out a network of roughly 36 data centers and co-locating in more than 60 public exchanges, the company has spent the past year deploying its Google Global Cache (GGC) servers inside consumer networks across the globe. Labovitz says that according to Arbor's anecdotal conversations, more than half of all consumer providers in North American and Europe now have at least one rack of Google's cache servers.
Honestly, I am speechless beyond the word frightened, you may want to refer to an earlier blog post "Lesser of two Evils" for more details.