TechOpsGuys.com Diggin' technology every day

21Sep/10Off

Online Schema Changes for MySQL

TechOps Guy: Nate

Looks like Facebook released a pretty cool tool that apparently provides the ability to perform MySQL schema changes online, something most real databases take for granted.

Another thing noted by our friends at The Register, was how extensively Facebook leverages MySQL. I was working on a project revolving around Apache Hadoop and someone that was involved with it was under the incorrect assumption that Facebook stores most of it's data on Hadoop.

At Facebook, MySQL is the primary repository for user data, with InnoDB the accompanying storage engine.
[..]
All Callaghan will say is that the company runs "X thousands" of MySQL servers. "X" is such a large number, the company needed a way of making index changes on live machines.

I wouldn't be surprised if they probably had a comparable number of MySQL servers to servers running Hadoop. After all Yahoo! is the biggest Hadoop user and at my last count had "only" about 25,000 servers running the software.

It certainly is unfortunate to see so many people out there see some sort of solution and think they can get it to solve all of their problems.

Hadoop is a good example, lots of poor assumptions are made around Hadoop. It's designed to do one thing really well, and it does that fairly well. But when you think you can adapt it into a more general purpose storage system it starts falling apart. Which is completely understandable, it wasn't designed for that purpose. Many people don't understand that simple concept though.

Another poor use of Hadoop is trying to shoehorn a real time application on top of it, it just doesn't work. Yet there are people out there (I've talked to some of them in person) who have devoted significant developer resources to try to attack that angle. Spend thirty minutes of time researching the topic and you can realize pretty quickly that it is a wasted effort. Google couldn't even do it!

Speaking of Hadoop, and Oracle for that matter it seems Oracle announced a Hadoop-style system yesterday at Open World, only Oracle's version seems to be orders of magnitutde faster (and more orders of magnitude expensive given the amount of flash it is using).

Using the skinnier and faster SAS disks, Oracle says that the Exadata X2-8 appliance can deliver up to 25GB/sec of raw disk bandwidth on uncompressed data and 50GB/sec across the flash drives. The disks deliver 50,000 I/O operations per second (IOPs), while the flash delivers 1 million IOPs. The machine has 100TB of raw disk capacity per rack and up to 28TB of uncompressed user data. The rack can load data at a rate of 5TB per hour. Using the fatter disks, the aggregate disk bandwidth drops to 14GB/sec, but the capacity goes up to 336TB and the user data space grows to 100TB.

The system is backed by an Infiniband-based network, I didn't notice specifics but assume 40Gbps per system.

Quite impressive indeed. Like Hadoop, this Exadata system is optimized for throughput, it can do IOPS pretty well too but it's clear that throughput is the goal. By contrast a more traditional SAN gets single digit gigabytes per second even on the ultra high end for data transfers at least on the industry standard SPC-2 benchmark.

  • IBM DS8700 rated at around 7.2 Gigabytes/second with 256 drives and 256GB cache costing a cool $2 million
  • Hitachi USP-V rated at around 8.7 Gigabytes/second with 265 drives and 128GB cache costing a cool $1.6 million

Now it's not really apples to apples comparison of course, but it can give some frame of reference.

It seems to scale really well according to Oracle -

Ellison is taking heart from the Exadata V2 data warehousing and online transaction processing appliance, which he said now has a $1.5bn pipeline for fiscal 2011. He also bragged that at Softbank, Teradata's largest customer in Japan, Oracle won a deal to replace 60 racks of Teradata gear with three racks of Exadata gear, which he said provided better performance and which had revenues that were split half-and-half on the hardware/software divide.

From 60 to 3? Hard to ignore those sorts of numbers!

Oh and speaking of Facebook, and Hadoop, and Oracle, as part of my research into the topic of Hadoop I came across this, I don't know how up to date it is but thought it was neat. Oracle DB is one product I do miss using, the company is filled with scumbags to be sure, I had to educate their own sales people on their licensing the last time I dealt with them. But it is a nice product, works really well, and IMO at least it's pretty easy to use especially with enterprise manager (cursed by DBAs from coast to coast I know!). Of course makes MySQL look like it's a text file based key-value pair database by comparison.

Anyways onto the picture!

Oh my god! Facebook is not only using Hadoop, but they are using MySQL, normal NAS storage, and even Oracle RAC! Who'da thunk it?

Find a tool or a solution that does everything well? The more generic the approach, the more difficult it is to pull it off, which is why so many solutions like that typically cost a significant amount of money, because there is significant value in what the product provides. If perhaps the largest open source platform in the world (Linux) has not been able to do it (how many big time open source advocates do you see running OS X and how many run OS X on their servers), who can?

That's what I thought.

(posted from my Debian Lenny workstation with updates from my Ubuntu Lucid Lynx laptop)
Tagged as: , Comments Off
12Sep/10Off

Google waves goodbye to Mapreduce

TechOps Guy: Nate

From the group of people that brought the Map Reduce algorithm to a much broader audience (despite the concepts being decades old), Google has now outgrown it and is moving on according to our friends at The Register.

The main reason behind it is map reduce was hindering their ability to provide near real time updates to their index. So they migrated their Search infrastructure to a Bigtable distributed database. They also optimized the next generation Google file system for this database, making it inappropriate for more general uses.

MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

I have to wonder how much this new distributed database-based index was responsible for Google to be able to absorb upwards of a 7 fold increase in search traffic due to the Google Instant feature being launched.

I had an interview at a company a couple of months ago that was trying to use Hadoop + Map ReduceĀ  for near real-time operations (the product had not launched yet), and thought that wasn't a very good use of the technology. It's a batch processing system. Google of course realized this and ditched it when it could no longer scale to the levels of performance that they needed (despite having an estimated 1.8 million servers at their disposal).

As more things get closer to real time I can't help but wonder about all those other companies out there that have hopped on the Hadoop/Map Reduce bandwagon, when they will realize this and try once again to follow the food crumbs that Google is dropping.

I just hope for those organizations, that they don't compete with Google in any way, because they will be at a severe disadvantage from a few angles:

  • Google has a near infinite amount of developer resources internally and as one Yahoo! person said "[Google] clearly demonstrated that it still has the organizational courage to challenge its own preconceptions,"
  • Google has a near infinite hardware capacity and economies of scale. What one company may pay $3-5,000 for, Google probably pays less than $1,000. They are the largest single organization that uses computers in the world. They are known for getting special CPUs,. and everyone at cloud scale operates with specialized motherboard designs. They build their own switches and routers (maybe). Though last I heard they are still a massive user of Citrix Netscaler load balancers.
  • Google of course operates it's own high end, power efficient data centers which means they get many more servers per kW than you can get in a typical data center. I wrote earlier in the year about a new container that supports 45 kilowatts per rack, more than ten times your average data center.
  • Google is the world's third largest internet carrier and due to peering agreements pays almost nothing for bandwidth.

Google will be releasing more information about their new system soon, I can already see the army of minions out there gearing up to try to duplicate the work and try to remain competitive. ha ha! I wouldn't want to be them, that's all I can say :)

[Google's] Lipokovitz stresses that he is "not claiming that the rest of the world is behind us."

Got to admire the modesty!

Tagged as: , 1 Comment