Stupid me, here I was thinking if you run MySQL in multi master mode it should not have issues with writes coming in to multiple locations. I've never heard of any issues(at the same time it's been a while since I've heard anyone talk about running multi master MySQL themselves), but apparently there are some as this guy is indicating, he has a webinar about it on November 15th. It sounds interesting to me, though I'll be on the road that day so won't be able to listen in, hopefully he posts the data afterwards.
At my current organization we do have multi master MySQL though we have yet to run them active-active (with writes going to both) for more than a short period at a time (usually just during fail-over events - "oh my god MySQL is about to crash fail over!"). The load balancing is handled by our Netscalers and their MySQL-aware load balancing. Overall the load is low enough(avg under 10% CPU and avg 75 write IOPS/DB - all reads being served from RAM) that I don't think it'd provide any performance benefit to us anyway.
From the MySQL performance blog post
This talk gives an overview and concrete examples of how writing across dual-masters can and will break your assumptions about ACID compliance, how you can work around it, and some alternative solutions that are on the market today that attempt to address this problem. This will be a great session for DBAs just getting into this problem space, are moving from hot-cold architectures to hot-warm or even hot-hot, and even for developers to get a sense of the difficultly of this problem.
This seems pretty neat. Not long ago Fusion IO announced their first new real product refresh in quite a while which offers significantly enhanced performance.
Fusion-io also announced a new extension to its VSL (Virtual Storage Layer) software subsystem for conducting Atomic Writes in the popular MySQL open source database. Atomic Writes are an operation in which a processor can simultaneously write multiple independent storage sectors as a single storage transaction. This accelerates mySQL and gives new features powered by the flexibility of sophisticated flash architectures. With the new Atomic Writes extension, Fusion-io testing has observed 35 percent more transactions per second and a 2.5x improvement in performance predictability compared to conducting the same MySQL tests without the Atomic Writes feature.
I know that Facebook is a massive user of Fusion IO for their MySQL database farms, I suspect this feature was made for them! Though it can benefit everyone.
My only question would be can this Atomic write capability be used by MySQL when running through the ESX storage layer, or does there need to be more native access from the OS.
About the new product lines, from The Register -
The ioDrive 2 comes in SLC form with capacities of 400GB and 600GB. It can deliver 450,000 write IOPS working with 512-byte data blocks and 350,000 read IOPS. These are whopping great increases, 3.3 times faster for the write IOPS number, over the original ioDrive SLC model which did 135,000 write IOPS and 140,000 read IOPS. It delivered sequential data at 750-770MB/sec whereas the next-gen product does it at 1.5GB/sec, around two times faster.
All the products will ship in November. Prices start from $5,950
The cards aren't available yet, wonder how accurate those numbers will end up being? But in any case, even if they were over inflated by a large amount that's still an amazing amount of I/O.
On a related note I was just browsing the Fusion IO blog which mentions this MySQL functionality as well and saw that Fusion IO was/is showing off a beefy 8-way HP DL980 with 14 HP-branded IO accelerators at Oracle Openworld -
We're running Oracle Enterprise Edition database version 11g Release 2 on a single eight processor HP ProLiant DL980 G7 system integrated with 14 Fusion ioMemory-based HP IO Accelerators, achieving performance of more than 600,000 IOPS with over 6GB/s bandwidth using a real world, read/write mixed workload.
the HP Data Accelerator Solution for Oracle is configured with up to 12TB of high performance flash[..]
After reading that I could not help but think how HP's own Vertica, with it's extremely optimized encoding and compression scheme would run on such a beast. I mean if you can get 10x compression out of the system(Vertica's best-case real world is 30:1 for reference), get a pair of these boxes (Vertica would mirror between the two) and you have upwards of 240TB of data to play with.
I say 240TB because of the way Vertica mirrors the data it allows you to store it in a different sort order on the mirror allowing for even faster access if your querying the data in different ways. Who knows - with the compression you may be able to get much better than 10:1 depending on your data.
Vertica is so fast that you will probably end up CPU bound more than anything else - 80 cores per server is quite a lot though! The DL980 supports up to 16 PCI Express slots so even with 14 cards that still leaves room for a couple 10GigE ports and/or Fibre channel or some other form of connectivity other than what's on the motherboard (which seems to have an optional dual port 10GbE NIC)
With Vertica's licensing (last I checked) starting in the 10s of thousands of dollars per raw TB (before compression), it falls into the category for me to blow a ton of money on hardware to make it run the best it possibly can (same goes for Oracle - though Standard Edition to a lesser degree). Vertica is coming out with a Community Edition soon which I believe is free, I don't recall what the restrictions are I think one of them was it was limited to a single server, I don't recall yet hearing on what the storage limits might be(I'd assume there would be some limit maybe half a TB or something)
Just a quick post, came across this on the MySQL Performance blog and thought it was a really well written paper. Talks about vertical scaling in the most current versions of MySQL, what the major bottlenecks are when scaling with more CPU cores, and how to extract the highest amount of I/O out of today's modern server hardware.
What I'd like to see just for comparison purposes is running the latest & greatest MySQL, vertically scale it to 48 cores, and compare it against Oracle Standard Edition on the same 48 cores. As far as I know the Oracle license agreement forbid publishing performance numbers so I'll probably never see this but it is a curiosity of mine, because sharding a database can make application development significantly more complex.
It is nice though that the latest versions of MySQL can scale beyond four cores.
Another thing noted by our friends at The Register, was how extensively Facebook leverages MySQL. I was working on a project revolving around Apache Hadoop and someone that was involved with it was under the incorrect assumption that Facebook stores most of it's data on Hadoop.
At Facebook, MySQL is the primary repository for user data, with InnoDB the accompanying storage engine.
All Callaghan will say is that the company runs "X thousands" of MySQL servers. "X" is such a large number, the company needed a way of making index changes on live machines.
I wouldn't be surprised if they probably had a comparable number of MySQL servers to servers running Hadoop. After all Yahoo! is the biggest Hadoop user and at my last count had "only" about 25,000 servers running the software.
It certainly is unfortunate to see so many people out there see some sort of solution and think they can get it to solve all of their problems.
Hadoop is a good example, lots of poor assumptions are made around Hadoop. It's designed to do one thing really well, and it does that fairly well. But when you think you can adapt it into a more general purpose storage system it starts falling apart. Which is completely understandable, it wasn't designed for that purpose. Many people don't understand that simple concept though.
Another poor use of Hadoop is trying to shoehorn a real time application on top of it, it just doesn't work. Yet there are people out there (I've talked to some of them in person) who have devoted significant developer resources to try to attack that angle. Spend thirty minutes of time researching the topic and you can realize pretty quickly that it is a wasted effort. Google couldn't even do it!
Speaking of Hadoop, and Oracle for that matter it seems Oracle announced a Hadoop-style system yesterday at Open World, only Oracle's version seems to be orders of magnitutde faster (and more orders of magnitude expensive given the amount of flash it is using).
Using the skinnier and faster SAS disks, Oracle says that the Exadata X2-8 appliance can deliver up to 25GB/sec of raw disk bandwidth on uncompressed data and 50GB/sec across the flash drives. The disks deliver 50,000 I/O operations per second (IOPs), while the flash delivers 1 million IOPs. The machine has 100TB of raw disk capacity per rack and up to 28TB of uncompressed user data. The rack can load data at a rate of 5TB per hour. Using the fatter disks, the aggregate disk bandwidth drops to 14GB/sec, but the capacity goes up to 336TB and the user data space grows to 100TB.
The system is backed by an Infiniband-based network, I didn't notice specifics but assume 40Gbps per system.
Quite impressive indeed. Like Hadoop, this Exadata system is optimized for throughput, it can do IOPS pretty well too but it's clear that throughput is the goal. By contrast a more traditional SAN gets single digit gigabytes per second even on the ultra high end for data transfers at least on the industry standard SPC-2 benchmark.
- IBM DS8700 rated at around 7.2 Gigabytes/second with 256 drives and 256GB cache costing a cool $2 million
- Hitachi USP-V rated at around 8.7 Gigabytes/second with 265 drives and 128GB cache costing a cool $1.6 million
Now it's not really apples to apples comparison of course, but it can give some frame of reference.
It seems to scale really well according to Oracle -
Ellison is taking heart from the Exadata V2 data warehousing and online transaction processing appliance, which he said now has a $1.5bn pipeline for fiscal 2011. He also bragged that at Softbank, Teradata's largest customer in Japan, Oracle won a deal to replace 60 racks of Teradata gear with three racks of Exadata gear, which he said provided better performance and which had revenues that were split half-and-half on the hardware/software divide.
From 60 to 3? Hard to ignore those sorts of numbers!
Oh and speaking of Facebook, and Hadoop, and Oracle, as part of my research into the topic of Hadoop I came across this, I don't know how up to date it is but thought it was neat. Oracle DB is one product I do miss using, the company is filled with scumbags to be sure, I had to educate their own sales people on their licensing the last time I dealt with them. But it is a nice product, works really well, and IMO at least it's pretty easy to use especially with enterprise manager (cursed by DBAs from coast to coast I know!). Of course makes MySQL look like it's a text file based key-value pair database by comparison.
Anyways onto the picture!
Oh my god! Facebook is not only using Hadoop, but they are using MySQL, normal NAS storage, and even Oracle RAC! Who'da thunk it?
Find a tool or a solution that does everything well? The more generic the approach, the more difficult it is to pull it off, which is why so many solutions like that typically cost a significant amount of money, because there is significant value in what the product provides. If perhaps the largest open source platform in the world (Linux) has not been able to do it (how many big time open source advocates do you see running OS X and how many run OS X on their servers), who can?
That's what I thought.(posted from my Debian Lenny workstation with updates from my Ubuntu Lucid Lynx laptop)