TechOpsGuys.com Diggin' technology every day

27Sep/10Off

Bye Bye 3PAR, Hello HP!

TechOps Guy: Nate

Wow that was fast! HP completed it's purchase of 3PAR this morning.

HP today announced that it has completed the acquisition of 3PAR Inc., a leading global provider of utility storage, for a price of $33 per share in cash, or an enterprise value of $2.35 billion.

3PAR technologies expand HP’s storage portfolio into enterprise-class public and private cloud computing environments, which are key growth markets for HP. Complementary with HP’s current storage portfolio, 3PAR brings market-differentiating technology to HP that will enable clients to maximize storage utilization, balance workloads and automate storage tiering. This allows clients to improve productivity and more efficiently operate their storage networks.

With a worldwide sales and channel network, coupled with extensive service operations, HP is uniquely positioned to rapidly expand 3PAR’s market opportunity. As part of the HP Converged Infrastructure portfolio, which integrates servers, storage, networking and management technologies, 3PAR solutions will further strengthen HP’s ability to simplify data center environments for clients.

Further details on product integration will be announced at a later date.

Certainly not messing around!

Tagged as: , , No Comments
26Sep/10Off

Still waiting for Xiotech..

TechOps Guy: Nate

So I was browsing the SPC-1 pages again to see if there was anything new and lo and behold, Xiotech posted some new numbers.

But once again, they appear too timid to release numbers for their 7000 series, or the 9000 series that came out somewhat recently. Instead they prefer to extrapolate performance from their individual boxes and aggregate the results. That doesn't count of course, performance can be radically different at higher scale.

Why do I mention this? Well nearly a year ago their CEO blogged, in response to one of my posts, and that was one of the first times I made news in The Register (yay! - I really was excited) , and in part the CEO said:

Responding to the Techopsguy blog view that 3PAR's T800 outperforms an Emprise 7000, the Xiotech writer claims that Xiotech has tested "a large Emprise 7000 configuration" on what seems to be the SPC-1 benchmark; "Those results are not published yet, but we can say with certainty that the results are superior to the array mentioned in the blog (3PAR T800) in several terms: $/IOP, IOPS/disk and IOPS/controller node, amongst others."

So here we are almost a year later, and more than one SPC-1 result later, and still no sign of Xiotech's SPC-1 numbers for their higher end units. I'm sorry but I can't help but feel they are hiding something.

If I were them I would put my customers more at ease by publishing said numbers, and be prepared to justify the results if they don't match up to Xiotech's extrapolated numbers from the 5000 series.

Maybe they are worried they might end up like Pillar, who's CEO was pretty happy with their SPC-1 results. Shortly afterwards the 3PAR F400 launched and absolutely destroyed the Pillar numbers from every angle. You can see more info on these results here.

At the end of the day I don't care of course, it just was a thought in my head and gave me something to write about :)

I just noticed that these past two posts puts me over the top as far as the most number of posts I have done in a month since this TechOpsGuys things started. I'm glad I have my friends Dave, Jake and Tycen generating tons of content too, after all this site was their idea!

26Sep/10Off

Overhead associated with scale out designs

TechOps Guy: Nate

Was reading a neat article over at The Register again about the new Google indexing system. This caught my eye:

"The TPC-E results suggest a promising direction for future investigation. We chose an architecture that scales linearly over many orders of magnitude on commodity machines, but we’ve seen that this costs a significant 30-fold overhead compared to traditional database architectures.

Kind of makes you think... I guess if your operating at the scale they are, the overhead is not a big deal, they'll probably a find a way to reduce(ha ha, map reduce, get it? sorry) it over time.

Tagged as: No Comments
23Sep/10Off

Using open source: how do you give back?

TechOps Guy: Nate

After reading an article on The Register (yeah you probably realize by now I spend more time on that site online than pretty much any other site), it got me thinking about a topic that bugs me.

The article is from last week but is written by the CEO of the organization behind Ubuntu. It basically talks about how using open source software is a good way to save costs in a down(or up) economy. And tries to give a bunch of examples on companies basing their stuff on open source.

That's great, I like open source myself, fired up my first Slackware Linux box in 1996 I think it was(Slackware 3.0). I remember picking Slackware over Red Hat at the time specifically because Slackware was known to be more difficult to use and it would force me to learn Linux the hard way, and believe me I learned a lot. To this day people ask me what they should study or do to learn Linux and I don't have a good answer, I don't have a quick and easy way to learn Linux the way I learned it. It takes time, months, years of just playing around with it. With so many "easy" distributions these days I'm not sure how practical my approach is now but I'm getting off topic here.

So back to what bugs me. What bugs me is people out there, or more specifically organizations out there that do nothing but leach off of the open source community. Companies that may make millions(or billions!) in revenue in large part because they are leveraging free stuff. But it's not the usage of the free stuff that I have a problem with, more power to them. I get annoyed when those same organizations feel absolutely no moral obligation to contribute back to those that have given them so much.

You don't have to do much. Over the years the most that I have contributed back have been participating in mailing lists, whether it is the Debian users list(been many years since I was active there), or the Red Hat mailing list(few years), or the CentOS mailing list(several months). I try to help where I can. I have a good deal of Linux experience, which often means the questions I have nobody else on the list has answers to. But I do(well did) answer a ton of questions. I'm happy to help. I'm sure at some point I will re-join one of those lists(or maybe another one) and help out again, but been really busy these past few months. I remember even buying a bunch of Loki games to try to do my part in helping them(despite it not being open source, they were supporting Linux indirectly). Several of which I never ended up playing(not much of a gamer). VMware of course was also a really early Linux supporter(still have my VMware 1.0.2 linux CD I believe that was the first version they released on CD previous versions were download only), though I have gotten tired of waiting for vCenter for Linux.

The easiest way for a corporation to contribute back is to say use and pay for Red Hat Enterprise, or SuSE or whatever. Pay the companies that hire the developers to to make the open source software go. I'm partial to Red Hat myself at least in a business environment, though I use Debian-based in my personal life.

There are a lot of big companies that do contribute code back, and that is great too, if you have the expertise in house. Opscode is one such company I have been working with recently on their Chef product. They leverage all sorts of open source stuff in their product(which in itself is open source). I asked them what their policy is for getting things fixed in the open source code they depend on, do they just file bugs and wait or do they contribute code, and they said they contribute a bunch of code, constantly. That's great, I have enormous respect for organizations that are like that.

Then there are the companies that leach off open source and not only don't officially contribute in any way whatsoever but they actively prevent their own employees from doing so. That's really frustrating & stupid.

Imagine where Linux, and everything else would be if more companies contributed back. It's not hard, go get a subscription to Red Hat, or Ubuntu or whatever for your servers (or desktops!). You don't have to contribute code, and if you can't contribute back in the form of supporting the community on mailing lists, or helping out with documentation, or the wikis or whatever. Write a check, and you actually get something in return, it's not like it's a donation. But donations are certainly accepted by the vast numbers of open source non profits

HP has been a pretty big backer of open source for a long time, they've donated a lot of hardware to support kernel.org and have been long time Debian supporters.

Another way to give back is to leverage your infrastructure, if you have a lot of bandwidth or excess server capacity or disk space or whatever, setup a mirror, sponsor a project. Looking at the Debian page as an example it seems AboveNet is one such company.

I don't use open source everywhere, I'm not one of those folks who has to make sure everything is GPL or whatever.

So all I ask, is the next time you build or deploy some project that is made possible by who knows how many layers of open source products, ask yourself how you can contribute back to support the greater good. If you have already then I thank you :)

Speaking of Debian, did you know that Debian powers 3PAR storage systems? Well it did at one point I haven't checked recently, I do recall telnetting to my arrays on port 22 and seeing a Debian SSH banner. The underlying Linux OS was never exposed to the user. And it seems 3PAR reports bugs, which is another important way to contribute back. And, as of 3PAR's 2.3.1 release(I believe) they finally officially started supporting Debian as a platform to connect to their storage systems. By contrast they do not support CentOS.

Extreme Networks's ExtremeWare XOS is also based on Linux, though I think it's a special embedded version. I remember in the early days they didn't want to admit it was Linux they said "Unix based". I just dug this up from a backup from back in 2005, once I saw this on my core switch booting up I was pretty sure it was Linux!

Extreme Networks Inc. BD 10808 MSM-R3 Boot Monitor
Version 1.0.1.5 Branch mariner_101b5 by release-manager on Mon 06/14/04
Copyright 2003, Extreme Networks, Inc.
Watchdog disabled.
Press and hold the <spacebar> to enter the bootrom.

Boot path is /dev/fat/wd0a/vmlinux
(elf)
0x85000000/18368 + 0x85006000/6377472 + 0x8561b000/12752(z) + 91 syms/
Running image boot...

Starting Extremeware XOS 11.1.2b3
Copyright (C) 1996-2004 Extreme Networks.  All rights reserved.
Protected by U.S. Patents 6,678,248; 6,104,700; 6,766,482; 6,618,388; 6,034,957

Then there's my Tivo that runs Linux, my TV runs Linux(Phillips TV), my Qlogic FC switches run Linux, I know F5 equipment runs on Linux, my phone runs Linux(Palm Pre). It really is pretty crazy how far Linux has come in the past 10 years. And I'm pretty convinced the GPL played a big part, making it more difficult to fork it off and keep the changes for yourself. A lot of momentum built up in Linux and companies and everyone just flocked to it. I do recall early F5 load balancers used BSDI, but switched over to Linux (didn't the company behind BSDI go out of business earlier this decade? or maybe they got bought I forget). Seems Linux is everywhere and in most cases you never notice it. The only way I knew it was in my TV is because of the instructions came with all sorts of GPL disclosures.

In theory the BSD licensing scheme should make the *BSDs much more attractive, but for the most part *BSD has not been able to keep pace with Linux(outside some specific niches I do love OpenBSD's pf) so never really got anywhere close to the critical mass Linux has.

Of course now someone will tell me some big fancy device that runs BSD that is in every data center, every household and I don't know it's there! If I recall right I do remember that Juniper's JunOS is based on FreeBSD? And I think Force10 uses NetBSD.

Also recall being told by some EMC consultants back in 2004/2005 that the EMC Symmetrix ran Linux too, I do remember the Clariions of the time(at least, maybe still) ran Windows(probably because EMC bought the company that made that product rather than creating it themselves)

Tagged as: No Comments
22Sep/10Off

The Cloud: Grenade fishing in a barrel

TechOps Guy: Nate

I can't help but laugh. I mean I've been involved in several initiatives surrounding the cloud. So many people out there think the cloud is efficient and cost effective. Whoever came up with the whole concept deserves to have their own island (or country) by now.

Because, really, competing against the cloud is like grenade fishing in a barrel. Shooting fish in a barrel isn't easy enough, really it's not!

Chuck from EMC earlier in the year talked to the folks at Pfizer around their use of the Amazon cloud, and the real story behind it. Interesting read, really shows the value you can get from the cloud if you use it right.

R+D’s use of HPC resources is unimaginably bursty and diverse, where on any given day one of 1000 different applications will be run. Periodically enormous projects (of very short duration!) come up very quickly, driven by new science or insights, which sometimes are required to make key financial or  strategic decisions with vast amounts of money at stake for the business.

As a result, there's no real ability to forecast or plan in any sort of traditional IT sense.  The HPC team has to be able to respond in a matter of days to huge requests for on-demand resources -- far outside the normal peaks and valleys you'd find in most traditional IT settings.

But those use cases at the moment really are few and far between. Contrasted by use cases of having your own cloud (of sorts) lots more use there. It would not surprise me if over time Pfizer continues to expand it's internal HPC stuff as it gets more of a grasp as far as what the average utilization rate is and host more and more stuff internally vs going to Amazon. It's just that in the early days of this they don't have enough data to predict how much they need. They may never get completely out of the cloud I'm just saying that the high watermark(for lack of a better term) can be monitored so that there is less significant "bursting" to the cloud.

Now if Pfizer is unable to ever really get a grip on forecasting their HPC requirements well then they might just keep using the cloud, but I suspect at the end of the day they' will get better forecasting. They obviously have the talent internally to do this very tricky balance of cloud and internal HPC. The cloud people would have you believe it's a simple thing to do, it's really not. Especially for off the shelf applications. If you have seen the numbers I have seen, you'd shake your head too. Sort of the response I had when I did come across a real good use case for the cloud earlier this year.

I could see paying a lot more for premium cloud services if I got more, but I don't get more, in fact I get less, a LOT less, than doing it myself. Now for my own personal "server" that is in the Terremark cloud I can live with it, not a big deal my  needs are tiny(though now that I think about it they couldn't even give me a 2nd NAT address for a 2nd VM for SMTP purposes, I had to create a 2nd account to put my 2nd VM in it to get my 2nd NAT address, costs for me are the same regardless but it is a bit more complicated than it should be, and opening a 2nd account in their system caused all sorts of problems with their back end which seemed to get confused by having two accounts with the same name, had to engage support on more than one occasion to get all the issues fixed). But for real work stuff, no way.

Still so many sheep out there still buy the hype - hook, line and sinker.

Which can make jobs for people like me harder, I've heard the story time and time again from several different people in my position, PHB's are so sold on the cloud concept they can't comprehend why it's so much more expensive then doing it yourself, so they want you to justify it six ways from Sunday (if that's the right phrase). They know there's something wrong with your math but they don't know what it is so they want you to try to prove yourself wrong when your not. At the end of the day it works out though, just takes some time to break that glass ceiling (again it sounds like the right term but it might not be)

Then there's the argument the cloud people make, I was involved in one deal earlier in the year, usual situation, and the cloud providers said "well do you really have the staff to manage all of this?" I said "IT IS A RACK AND A HALF OF EQUIPMENT, HOW MANY PEOPLE DO I NEED, REALLY?" They were just as oblivious to that as the PHB's were to the cloud costs.

While I'm thinking of wikipedia anyone else experience massive slowdowns with their DNS infrastructure? It takes FOREVER to resolve their domains for me. All other domains resolve really fast. I run my own DNS, maybe there is something wrong with it I'm not sure, haven't investigated.

Tagged as: No Comments
21Sep/10Off

Online Schema Changes for MySQL

TechOps Guy: Nate

Looks like Facebook released a pretty cool tool that apparently provides the ability to perform MySQL schema changes online, something most real databases take for granted.

Another thing noted by our friends at The Register, was how extensively Facebook leverages MySQL. I was working on a project revolving around Apache Hadoop and someone that was involved with it was under the incorrect assumption that Facebook stores most of it's data on Hadoop.

At Facebook, MySQL is the primary repository for user data, with InnoDB the accompanying storage engine.
[..]
All Callaghan will say is that the company runs "X thousands" of MySQL servers. "X" is such a large number, the company needed a way of making index changes on live machines.

I wouldn't be surprised if they probably had a comparable number of MySQL servers to servers running Hadoop. After all Yahoo! is the biggest Hadoop user and at my last count had "only" about 25,000 servers running the software.

It certainly is unfortunate to see so many people out there see some sort of solution and think they can get it to solve all of their problems.

Hadoop is a good example, lots of poor assumptions are made around Hadoop. It's designed to do one thing really well, and it does that fairly well. But when you think you can adapt it into a more general purpose storage system it starts falling apart. Which is completely understandable, it wasn't designed for that purpose. Many people don't understand that simple concept though.

Another poor use of Hadoop is trying to shoehorn a real time application on top of it, it just doesn't work. Yet there are people out there (I've talked to some of them in person) who have devoted significant developer resources to try to attack that angle. Spend thirty minutes of time researching the topic and you can realize pretty quickly that it is a wasted effort. Google couldn't even do it!

Speaking of Hadoop, and Oracle for that matter it seems Oracle announced a Hadoop-style system yesterday at Open World, only Oracle's version seems to be orders of magnitutde faster (and more orders of magnitude expensive given the amount of flash it is using).

Using the skinnier and faster SAS disks, Oracle says that the Exadata X2-8 appliance can deliver up to 25GB/sec of raw disk bandwidth on uncompressed data and 50GB/sec across the flash drives. The disks deliver 50,000 I/O operations per second (IOPs), while the flash delivers 1 million IOPs. The machine has 100TB of raw disk capacity per rack and up to 28TB of uncompressed user data. The rack can load data at a rate of 5TB per hour. Using the fatter disks, the aggregate disk bandwidth drops to 14GB/sec, but the capacity goes up to 336TB and the user data space grows to 100TB.

The system is backed by an Infiniband-based network, I didn't notice specifics but assume 40Gbps per system.

Quite impressive indeed. Like Hadoop, this Exadata system is optimized for throughput, it can do IOPS pretty well too but it's clear that throughput is the goal. By contrast a more traditional SAN gets single digit gigabytes per second even on the ultra high end for data transfers at least on the industry standard SPC-2 benchmark.

  • IBM DS8700 rated at around 7.2 Gigabytes/second with 256 drives and 256GB cache costing a cool $2 million
  • Hitachi USP-V rated at around 8.7 Gigabytes/second with 265 drives and 128GB cache costing a cool $1.6 million

Now it's not really apples to apples comparison of course, but it can give some frame of reference.

It seems to scale really well according to Oracle -

Ellison is taking heart from the Exadata V2 data warehousing and online transaction processing appliance, which he said now has a $1.5bn pipeline for fiscal 2011. He also bragged that at Softbank, Teradata's largest customer in Japan, Oracle won a deal to replace 60 racks of Teradata gear with three racks of Exadata gear, which he said provided better performance and which had revenues that were split half-and-half on the hardware/software divide.

From 60 to 3? Hard to ignore those sorts of numbers!

Oh and speaking of Facebook, and Hadoop, and Oracle, as part of my research into the topic of Hadoop I came across this, I don't know how up to date it is but thought it was neat. Oracle DB is one product I do miss using, the company is filled with scumbags to be sure, I had to educate their own sales people on their licensing the last time I dealt with them. But it is a nice product, works really well, and IMO at least it's pretty easy to use especially with enterprise manager (cursed by DBAs from coast to coast I know!). Of course makes MySQL look like it's a text file based key-value pair database by comparison.

Anyways onto the picture!

Oh my god! Facebook is not only using Hadoop, but they are using MySQL, normal NAS storage, and even Oracle RAC! Who'da thunk it?

Find a tool or a solution that does everything well? The more generic the approach, the more difficult it is to pull it off, which is why so many solutions like that typically cost a significant amount of money, because there is significant value in what the product provides. If perhaps the largest open source platform in the world (Linux) has not been able to do it (how many big time open source advocates do you see running OS X and how many run OS X on their servers), who can?

That's what I thought.

(posted from my Debian Lenny workstation with updates from my Ubuntu Lucid Lynx laptop)
Tagged as: , No Comments
17Sep/10Off

No more Cranky Geeks?

TechOps Guy: Nate

What!! I just noticed that it seems the only online video feed I watch, Cranky Geeks seems to be coming to an end? That sucks! I didn't stumble upon the series until about one and a half years ago on my Tivo. Been an big fan ever since. I rarely learned anything from the shows but I did like observing the conversations, it's not quite to the technical depth that I get into but it's a far cry from the typical "tech tv" videos/shows that don't seem to go beyond things like over clocking and what motherboard and video card to use for the latest games.

I know I'm a hell of a lot more cranky than anyone I ever saw on the show but they did bitch about some things. There seems to have been quite a few video blogs, for a lack of a better word, that have bitten the dust in recent months, I guess the economy is taking it's toll.

[Begin Another Tangent --]

I believe that we are entering the second phase of the great depression (how long until we are solidly in the second phase I'm not sure, won't know until we're there), the phase where states realize their budget shortfalls are too big for short term budget gimmicks and make drastic cuts and tax hikes which further damages the economy. I don't blame anyone in particular for our situation it's a situation that has been festering for more than thirty years, it's like trying to stop an avalanche with I don't know a snow plow?

This is what happens when you give people every incentive possible to pull demand forward, you run out of gimmicks to pull demand forward and are faced with a very large chasm that will only be healed with time, just look at Japan.

I have seen lots of folks say that this is not as bad as the real Great Depression, but they aren't taking into account the massive amount of social safety nets that have been deployed over the past 40-50+ years, I just saw a news report last night that said the rate of poverty among children is the same as it was in the 1960s. And to think the cost of living in the U.S. is so high that living in poverty here in many countries if you got paid that you'd be in the upper middle class.

Not sustainable, and as time goes on more and more people are realizing this, unfortunately too late for many they will be left behind, permanently.

My suggestion? Read the infrastructure report card. Yes I know infrastructure spending is not a short term stimulus, we need to take advantage of lower prices for wages, and materials, and rebuild the country, it will take years, maybe even a couple of decades but we need it. Long term problems call for long term solutions.

[End Another Tangent --]

I hope it doesn't go but it looks like it's essentially gone, and I just added the link to the blog roll a few days ago!

Noticed this from John in the comments -

The two companies couldn’t come to any agreement. This is a problem when you personally do not own the show. The fact is the show is not what advertising agencies want. They want two minute shows with a 15 second pre-roll ad at the beginning. They see no market for anything with a long format unless it is on network TV.

The irony is that the demographics for the show should be at $100/per k levels if they understood anything at all.

It’s amazing that we managed to get 4 1/2 years out of the show.

RIP

Sigh

RIP Cranky Geeks, I shall miss you greatly.

Tagged as: No Comments
16Sep/10Off

How High?

TechOps Guy: Nate

I got this little applet on my Ubuntu desktop that tracks a few stocks of companies I am interested in(I don't invest in anything). And thought it was pretty crazy how close to the offer price the 3PAR stock price got today, I mean as high as 32.98, everyone of course knows the final price will be $33, to think folks are trading the stock with only $0.02 of margin to me is pretty insane.

Looks a fair sight better than the only public company I have ever worked for, surprised they are still around even!

I never bought any options, good thing I guess because from the day I was hired the stock never did anything but go down, I think my options were in the ~$4.50 range (this was 2000-2002)

Just dug this up, I remember being so proud my company is on TV! Not quite as weird as watching the freeinternet.com commercials back when I worked there. A company that spent $7 million a month on bandwidth it didn't know it had and wasn't utilizing. Of course by the time they found out it was too late.

My company at the top of the list! I miss Tom Costello, he was a good NASDAQ floor guy. Screen shot is from March 2002. Also crazy that the DOW is only 68 points higher today than it was eight years ago.

Tagged as: , No Comments
16Sep/10Off

Fusion IO now with VMware support

TechOps Guy: Nate

About damn time! I read earlier in the year on their forums that they were planning on ESX support for their next release of code, originally expected sometime in March/April or something. But that time came and went and saw no new updates.

I saw that Fusion IO put on a pretty impressive VDI demonstration at VMworld, so I figured they must have VMware support now, and of course they do.

I would be very interested to see how performance could be boosted and VM density incerased by leveraging local Fusion IO storage for swap in ESX.  I know of a few 3PAR customers that say they get double the VM density per host vs other storage because of the better I/O they get from 3PAR, though of course Fusion IO is quite a bit snappier.

With VMware's ability to set swap file locations on a per-host basis, it's pretty easy to configure, in order to take advantage of it though you'd have to disable memory ballooning in the guests I think in order to force the host to swap. I don't think I would go so far as to try to put individual swap partitions on the local fusion IO for the guests to swap to directly, at least not when I'm using a shared storage system.

I just checked again, and as far as I can tell, still, from a blade perspective at least, still the only player offering Fusion IO modues for their blades is the HP c Class in the form of their IO Accelerator. With up to two expansion slots on the half width, and three on the full width blades, there's plenty of room for the 80, 160 GB SLC models or the 320GB MLC model. And if you were really crazy I guess you could use the "standard" Fusion IO cards with the blades by using the PCI Express expansion module, though that seems more geared towards video cards as upcomming VDI technologies leverage hardware GPU acceleration.

HP's Fusion IO-based I/O Accelerator

FusionIO claims to be able to write 5TB per day for 24 years, even if you cut that to 2TB per day for 5 years, it's quite an amazing claim.

From what I have seen (can't speak with personal experience just yet), the biggest advantage Fusion IO has over more traditional SSDs is write performance, of course to get optimal write performance on the system you do need to sacrifice space.

Unlike drive form factor devices, the ioDrive can be tuned to achieve a higher steady-state write performance than what it is shipped with from the factory.

15Sep/10Off

Time to drop a tier?

TechOps Guy: Nate

Came across an interesting slide show, The Ultimate guide to the flat data center network. at Network World. From page 7:

All of the major switch vendors have come out with approaches that flatten the network down to two tiers, and in some cases one tier. The two-tier network eliminates the aggregation layer and creates a switch fabric based on a new protocol dubbed TRILL for Transparent Interconnection of Lots of Links. Perlman is a member of the IETF working group developing TRILL.

For myself, I have been designing two tier networks for about 6 years now with my favorite protocol ESRP. I won't go into too much detail this time around, click the link for an in-depth article but here is a diagram I modified from Extreme to show what my deployments have looked like:

Sample ESRP Mesh network

ESRP is very simple to manage, scalable, mature, and with a mesh design like the above, the only place it needs to run is on the core. The edge switches can be any model, any vendor, managed, and even unmanaged switches will work without trouble. Fail over is sub second, not quite the 25-50ms that EAPS provides for voice grade, not that I have had any way to accurately measure it but I would say it's reasoanble to expect a ~500ms fail over in an all-Extreme network(where the switches communicate via EDP), or ~750-1000ms for switches that are not Extreme.

Why ESRP? Well because as far as I have seen since I started using it, there is no other protocol on the market that can do what it can do (at all, let alone as easily as it can do it).

Looking at TRILL briefly, it is unclear to me if it provides layer 3 fault tolerance or if you still must use a 2nd protocol like VRRP, ESRP or HSRP(ugh!) to do it.

The indication I get is that it is a layer 2 only protocol, if that is the case, seems very short sighted to design a fancy new protocol like that and not integrate at least optional layer 3 support, we've been running layer 3 for more than a decade on switches.

In case you didn't know, or didn't click the link yet, ESRP by default runs in both Layer 2 and Layer 3, though optionally can be configured to run in only one layer if your prefer.