Crazy Seagate Statistics

TechOps Guy: Nate

Been a while since I clicked on their blog but I just did and as the most current entry says, those are pretty eye popping.

  • A drive’s recording head hovers above the disks at a height of 100 atoms, 100 times thinner than a piece of paper
  • Seagate clean rooms are 100 times cleaner than a hospital operating room
  • Seagate can analyze over 1.5 Million drives at a time
  • Seagate builds 6 hard drives, hybrid drives, and solid state drives every second
  • Every single drive travels through over 1000 manufacturing steps

[Begin First Tangent --]

If your using a Seagate SATA disk, do yourself a favor and don't let the temperature of the drive drop below 20 degrees celcius :)

I read an interesting article recently on the various revenue numbers of the big drive manufacturers, and the numbers were surprising to me.

Hitach GST had revenues of $4.8bn in 2009.
Seagate's fiscal 2010 revenue of $11.4bn
Western Digital's latest annual revenue of $9.8bn

I really had no idea Western Digital was so big! After all since they do not (not sure if they ever did) participate in the SCSI / Fibre Channel / SAS arena that leaves them out of the enterprise space for the most part (I never really saw their Raptor line of drives get adopted, too bad!). Of course "Enterprise SATA" has taken off quite a bit in recent years but I would think that would still pale in comparison to Enterprise SAS/SCSI/FC. But maybe not I don't know, haven't looked into the details.

I thought Hitachi was a lot bigger especially since Hitachi bought the disk division from IBM way back when. I used to be a die hard fan of IBM disks, up until the 75GXP fiasco. I'm still weary of them even now. I still have a CDROM filled with "confidential" information with regards to the class action suit that I played a brief part in (the judge kicked me out because he wanted to consolidate the people in the suite to folks in California), very nteresting stuff, not that I remember much of it, I haven't looked at it in years.

The 75GXP was the only drive where I've ever suffered a "double disk failure" before I could get a replacement in. Only happened once. My company had 3 "backup" servers, one at each office site. Each one had I think it was 5 x 100GB disks, or was it another size, this was back in 2001. RAID5, connected to a 3Ware 7000-series controller. One Friday afternoon one of the disks in my local office failed, so I called to get an RMA, about 2 hours later, another disk failed in a remote office, so I called to get that one RMA'd too.  The next day the bad disk for my local server arrived, but it was essentially DOA from what I recall. So the system kept running in degraded mode( come on how many people's servers in 2001 had hot spares, that's what I thought). There was nobody in the office for the other server in degraded mode so the drive was set to arrive on Monday to be replaced. On Sunday that same weekend a 2nd disk in the remote server failed, killing the RAID array of course. In the end, that particular case wasn't a big deal, it was a backup server after all, everything on the disk was duplicated at least once to another site. But it was still a pain. If memory serves I had a good 15-20 75GXP disks fail over the period of a year or so(both home+work), all of them were what I would consider low duty cycle, hardly being stressed that much. In all cases the data lost wasn't a big deal, it was more of a big deal to be re-installing the systems, that took more time than anything else. Especially the Solaris systems..

[End First Tangent --]
[Begin Second Tangent -- ]

One thing that brings back fond childhood memories related to Seagate is where they are based out of - Scotts Valley, California. Myself I wouldn't consider it in Silicon Valley itself but it is about as close as you can get. I spent a lot of time in Soctts Valley as a kid, I grew up in Boulder Creek, California (up until I was about 12 anyways) which is about 10 miles from Scotts Valley. I considered it(probably still is) the first "big town" to home, where it had things like movie theaters, and arcades. I didn't find out Seagate was based there until a few years ago, but for some reason makes me proud(?), for such a big giant to be located in such a tiny town so close to what I consider home.

[End Second Tangent --]

Google waves goodbye to Mapreduce

TechOps Guy: Nate

From the group of people that brought the Map Reduce algorithm to a much broader audience (despite the concepts being decades old), Google has now outgrown it and is moving on according to our friends at The Register.

The main reason behind it is map reduce was hindering their ability to provide near real time updates to their index. So they migrated their Search infrastructure to a Bigtable distributed database. They also optimized the next generation Google file system for this database, making it inappropriate for more general uses.

MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

I have to wonder how much this new distributed database-based index was responsible for Google to be able to absorb upwards of a 7 fold increase in search traffic due to the Google Instant feature being launched.

I had an interview at a company a couple of months ago that was trying to use Hadoop + Map Reduce  for near real-time operations (the product had not launched yet), and thought that wasn't a very good use of the technology. It's a batch processing system. Google of course realized this and ditched it when it could no longer scale to the levels of performance that they needed (despite having an estimated 1.8 million servers at their disposal).

As more things get closer to real time I can't help but wonder about all those other companies out there that have hopped on the Hadoop/Map Reduce bandwagon, when they will realize this and try once again to follow the food crumbs that Google is dropping.

I just hope for those organizations, that they don't compete with Google in any way, because they will be at a severe disadvantage from a few angles:

  • Google has a near infinite amount of developer resources internally and as one Yahoo! person said "[Google] clearly demonstrated that it still has the organizational courage to challenge its own preconceptions,"
  • Google has a near infinite hardware capacity and economies of scale. What one company may pay $3-5,000 for, Google probably pays less than $1,000. They are the largest single organization that uses computers in the world. They are known for getting special CPUs,. and everyone at cloud scale operates with specialized motherboard designs. They build their own switches and routers (maybe). Though last I heard they are still a massive user of Citrix Netscaler load balancers.
  • Google of course operates it's own high end, power efficient data centers which means they get many more servers per kW than you can get in a typical data center. I wrote earlier in the year about a new container that supports 45 kilowatts per rack, more than ten times your average data center.
  • Google is the world's third largest internet carrier and due to peering agreements pays almost nothing for bandwidth.

Google will be releasing more information about their new system soon, I can already see the army of minions out there gearing up to try to duplicate the work and try to remain competitive. ha ha! I wouldn't want to be them, that's all I can say :)

[Google's] Lipokovitz stresses that he is "not claiming that the rest of the world is behind us."

Got to admire the modesty!

