TechOpsGuys.com Diggin' technology every day

21Jun/10Off

HP BL685c G7 Launched – Opteron 6100

TechOps Guy: Nate

I guess my VMware dream machine will remain a dream for now, HP launched their next generation G7 Opteron 6100 blades today, and while still very compelling systems, after the 6100 launched I saw the die size had increased somewhat (not surprising), it was enough to remove the ability to have 4 CPU sockets AND 48 memory slots on one full height blade.

Still a very good comparison illustrating the elimination of the 4P tax, that is eliminating the premium associated with quad socket servers. If you configure a BL485c G7 with 2x12-core CPUs and 128GB of memory(about $16,000), vs a BL685c G7 with 256GB of memory and the 4x12-core CPUs (about $32,000), the cost is about the same, no premium.

By contrast configuring a BL685c G6 with six core CPUs (e.g. half the number of cores as the G7), same memory, same networking, same fiber channel, the cost is roughly $52,000.

These have new Flex Fabric 2 NICs, which from the specs page seem to indicate they include iSCSI or FCoE support (I assume some sort of software licensing needed to unlock the added functionality? though can't find evidence of it). Here is a white paper on the Flex Fabric stuff, from what I gather it's just an evolutionary step of Virtual Connect. Myself of course have never had any real interest in FCoE (search the archives for details), but nice I suppose that HP is giving the option to those that do want to jump on that wagon.

19Jun/10Off

40 Million IOPS in two racks

TechOps Guy: Nate

Fusion IO does it again, another astonishing level of performance in such an efficient design, from the case study:

LLNL used Fusion’s ioMemory technology to create the world’s highest performance storage array. Using Fusion’s ioSANs and ioDrive Duos, the cluster achieves an unprecedented 40,800,000 IOPS and 320GB/s aggregate bandwidth.
Incredibly, Fusion’s ioMemory allowed LLNL to accomplish this feat in just two racks of appliances– something that would take a comparable hard disk-based solution over 43 racks. In fact, it would take over 100 of the SPC-1 benchmark’s leading all-flash vendor systems combined to match the performance, at a cost of over $300 million.

40 Million IOPS @ ~250 IOPS per 15K RPM disk your talking 160,000 disk drives.

Not all flash is created equal of course, many people don't understand that. They just see ooh this one is cheap, this one is not, not having any clue (shocker).

It's just flat out irresponsible to ignore such a industry changing technology, especially for workloads that deal with small (sub TB) amounts of data.

18Jun/10Off

Cross Pollination

TechOps Guy: Nate

I don't know about you but to me it's kind of odd. It's not something I would expect to be such a regular occurrence. Maybe it's just the region, or maybe it's broader.

I'm talking about vendor sales reps and engineers jumping from one company directly to the competition. I mean just from my own experience, I know people who have

  • Moved from Extreme networks to Juniper
  • Moved from Foundry networks to Extreme
  • Moved from HP networking to Extreme
  • Moved from Foundry networks to A10 networks
  • Moved from F5 networks to A10 networks
  • Moved from EMC to 3PAR
  • Moved from NetApp to 3PAR
  • Moved from Equallogic to 3PAR
  • And recently saw someone I hadn't  seen in more than a year and a half moved from Hitachi to Xiotech

Maybe it's just the line of work, but to me at least it seems like an amazing amount of cross pollination to the point where it's hard to tell on occasion what the person really believes, I mean one minute they are pitching product X from you and bashing product Y then the next their doing the opposite.

Then there are other less direct migrations of course going from a manufacturer to a VAR or a distributor, but I've been more fascinated by those making the leap from one manufacturer to another.

I suppose it's just a job at the end of the day.

Filed under: General Comments Off
15Jun/10Off

Storage Benchmarks

TechOps Guy: Nate

There are two main storage benchmarks I pay attention to:

Of course benchmarks are far from perfect, but they can provide a good starting point when determining what type of system you need to look towards based on your performance needs. Bringing in everything under the sun to test in house is a lot of work, much of it can be avoided by getting some reasonable expectations up front. Both of these benchmarks do a pretty good job. And it's really nice to have the database of performance results for easy comparison. There's tons of other benchmarks that can be used but very few have a good set of results you can check against.

SPC-1 is better as a process primarily because it forces the vendor to disclose the cost of the configuration and 3 years of support. They could improve on this further by forcing the vendor to provide updating pricing to the configuration for 3 years, while the performance of the configuration should not change(given the same components), the price certainly should decline over time so the cost aspects become harder to compare the further apart the tests(duh).

SPC-1 also forces full disclosure of everything required to configure the system, down to the CLI commands to configure the storage. You can get a good idea on how simple or complex the system is by looking at this information.

SPC-1 doesn't have a specific disclosure field for Cost per Usable TB. But it is easy enough to extrapolate from the other numbers in the reports, it would be nice if this was called out(well nice for some vendors, not so much for others). Cost per usable TB can really make systems that utilize short stroking to get performance stand out like a sore thumb. Another metric that would be interesting would be Watts per IOP and Watts per usable TB. The SPC-1E test reports Watts per IOP,  though I have hard time telling whether or not the power usage is taken at max load, seems to indicate power usage was calculated at 80% load.

Storage performance is by no means the only aspect you need to consider when getting a new array, but it usually is in at least the top 5.

I made a few graphs for some SPC-1 numbers, note the cost numbers need to be taken with a few grains of salt of course depending on how close the systems were tested, the USP for example was tested in 2007. But you can see trends at least.

The majority of the systems tested used 146GB 15k RPM disks.

Somewhat odd configurations:

  • From IBM, the easy tier config is the only one that uses SSD+SATA, and the other IBM system is using their SAN Volume Controller in a clustered configuration with two storage arrays behind it (two thousand spindles).
  • Fujitsu is short stroking 300GB disks.
  • NetApp is using RAID 6 (except IBM's SSDs everyone else is RAID 1)

I've never been good with spreadsheets or anything so I'm sure I could make these better, but they'll do for now.

Links to results:

15Jun/10Off

Next generation SSD

TechOps Guy: Nate

Another interesting article from our friends at The Register. This one talking about a new startup which is promising SLC-like peformance and reliability for MLC-like prices.

[..]

says the 200GB product has a five-year endurance at 2TB/day write data and the 400GB model a five-year endurance at 4TB/day. This is with random, non-compressible data.

[..]

Genesis has a 3Gbit/s SATA interface and has a 30,000 random read IOPS rating (4KB blocks), and a 20,000 random write IOPS rating. It provides 180MB/s sustained write and 220MB/s sustained read bandwidth.

Certainly looks interesting, not nearly as fast (or reliable) as say Fusion IO SLC or MLC for that matter, but probably a bit cheaper too.

Filed under: News, Storage Comments Off
14Jun/10Off

Sea Micro launches 512 core Atom server

TechOps Guy: Nate

An article from our friends at The Register talks about a new server design to hit the market. A very innovative solution from a recently decloaked stealth startup Sea Micro based on the Intel Atom processor, called the SM 10000.

Looks to be targetted at the hyperscale arena, SGI tried something similar to this last year with their Microslice design, though it's not nearly as efficient as this box is.

The SM1000 is a fairly radical departure from current designs, perhaps the closest design I've come across to this SM monster is a design from Dell's DCS division a few years ago that The Register reported on. This goes several steps beyond that by including in a single 10U chassis:

  • Up to 512 Atom CPUs each with up to 2GB memory
  • Up to 64  x 2.5" disks
  • Integrated ethernet switching and load balancing
  • Integrated terminal server
  • Virtualized I/O

This is targetted to a specific application - mainly web serving. The massive amount of parallelism in the system combined with the low power foot print (a mere 4W/server) can provide a high amount of throughput for many types of web applications. The ability to have SSDs in the system allow high I/O rates for smaller data sets.

From one of their white papers:

[..]hardware-based CPU I/O virtualization enables SeaMicro to eliminate 90 percent of the components from the server and to shrink the motherboard to the size of a credit card. Hundreds of these low-power, card-sized computational units are tied together with a supercomputer-style fabric to create a massive array of independent but linked computational units. Work is then distributed over these hundreds of CPUs via hardware- and software-based load-balancing technology that dynamically directs load to ensure that each of the CPUs is either in its most efficient zone of performance or is sleeping. The key technologies reside in three chips of SeaMicro’s design, one ASIC and two FPGAs, and in the management, routing, and load-balancing software that directs traffic across the fabric.

It's clearly targeted at the scale out web serving market, the likes of Google, Facebook, Yahoo. These aren't general purpose servers, I saw some stupid posts on Slashdot mentioning trying to run VMware on or something on top of this. The system is virtualized at the hardware level, there's no need for a software hypervisor running on top.

From another white paper, talking about the virtualized disk technology:

The SeaMicro SM10000 can be configured with 0 to 64 2.5 inch SATA hard disk drives (HDD) or solid state drives (SSD). The 512 CPUs in the system can be allocated portions of a disk or whole disks. A physical disk (HDD or SSD) can be divided into multiple virtual disks – from 2GB to the maximum capacity of the disk – and assigned to one or more CPUs. Data resiliency is maintained by marking a disk to be part of a RAID pool or by assigning multiple disks to a CPU. The system can be configured to run with or without disk, ensuring the flexibility to appropriately provision storage for the desired applications

My only questions at this stage would be:

  • How well does it work? Not knowing the internals of where they got their ethernet switching or load balancing technology from, or even RAID technology.
  • Their CPU of choice is 32-bit. For many workloads this is fine, though many others need 64bit.
  • Questions on how the shared disks work - you have the ability to take a SSD for example and put shared application code on a read only portion of the disk that can be read by as many servers in the system as you want, I suppose to take maximum advantage of the technology in the system you may have to make some changes to your application(s), it would be cool if they offered the ability to have the shared disk be writable by more than one system, using a cluster file system or something. Maybe this is possible I don't know.

A Base configuration starts at about $139,000 according to The Register. No mention of what that includes though.

Certainly seems to be a system that has a lot of promise for the market it is targetted towards!

Did I ever mention I love technology?

11Jun/10Off

Investing in IT vs spending in IT

TechOps Guy: Nate

My good friend Chuck over at EMC (ok we've never met but he seems like a nice guy, we could be friends) wrote an interesting article about Investing in IT vs Spending on IT. I thought it was a really good read, I hadn't thought of things in that way, but it made me realize I am one who wants to Invest in IT infrastructure, even if it means paying more up front, the value add of some solutions are just difficult to put numbers on.

Take storage for example, since Chuck is a storage guy. There's a lot more to storage than cost per TB, cost per IOP, cost per usable TB, and even more than cost of power+cooling for the solution. The smaller things really do add up over time, but how do you put numbers on them? Something as simple as granular monitoring, when I went through a storage refresh a while back the currently established vendor really had no way of measuring the performance of their own system to develop a plan for a suitable technology refresh. It wasn't a small system either it was a big fairly expensive (for the time) one.

Would you of expected to replace one storage system with another that had less than half the number of disks, and roughly 75% less raw IOPS (on paper)? Would you of expected the new system to not only outperform and out scale the old but continue to eat a significant amount of growth over the following year before needing an upgrade? If your a normal person I would expect you to not expect that. But that's what happened.

In my experience, my approach is to establish a track record at an organization, this may take a few months, or may take a year(may be much longer if it's a big company). Once you have established X number of successful projects, a higher degree of trust is put in you to have more lateral control and influence on how things work. Less hand holding, less minute justifications are required to get your point across, and you can get more things done.

Maybe that thinking is too logical, I don't know. It's how I think though, myself I put more faith in people the more I see how good they are at their jobs, I trust them more, if they turn out to provide good solutions or even good angles of thought I believe I can rely more on them to do that line of work than to work over their shoulder double checking everything. I think it's how you can scale. Of course not all employees measure up, I would say especially in larger organizations most do not(government is especially bad I hear).

No one person can run it all, as much as they'd like. I've tried, and well the results while not horrible weren't as good as having more people doing the work. I learned the hard way to delegate more work, whether it's to co-workers, or to contractors, or even to vendors. People take vendors for granted, there is a lot of experience and knowledge they can bring to the table, not all vendor teams are created equal of course.

If you just want to spend on IT, don't hire someone like me, I don't want to work for you. If you want to invest in IT, to give your organization more leadership in new technologies that can improve efficiencies and lower costs, then you may want someone like me. Which is why I gravitate towards smaller higher technological organizations. They usually don't have the economies of scale to do things as well as the big guns out there, so it's up to people like me to develop innovative solutions to compete differently. If you read the blog you'll see I don't subscribe to any one vendor stack. I like many different products from many different vendors depending on what the requirements are.

From a vendor perspective (since it's been 5 years since I worked with a contractor of sorts) I do like to have a good relationship with the vendor, they can be a valuable source of information. Vendors either love me or hate me, it really depends on their products, as folks that have worked with me can attest. It also depends on how technical the vendor can get with me. I like to go deep into the technical realm. And I believe I do challenge the System Engineers at my vendors with tough questions. Those that don't measure up don't last long. I have high expectations of myself, and I have high expectations of those around me, frequently too high. I don't like to play political games where you try to screw them over because you know they'll screw you back the first chance you get. Having a good relationship is one of those things it's hard to put a number on. To me it's worth a decent amount.

Jake, another person on this blog(hi Jake!) is similar, though he's a lot more loyal than me, which again can be a good thing as well. Changing technology paths every 15 minutes is not a good idea, having a dozen different server vendors in your racks because different ones provided 5-10% better pricing at that particular time of day is not a good idea either.

Speaking of Jake, I remember when I first started at my previous company and they were doing negotiations with Oracle on licensing. They were out of compliance on licensing(they paid for Oracle SE One but were using Oracle EE) and were facing hefty fines. I tried to propose an alternative solution (going to Oracle Standard Edition which is significantly different from SE One), which would of saved significant amounts of money with really no loss in functionality(for our apps at the time). I was a new(literally a few weeks) employee and Jake dismissed my opinion, which I could understand at the time I was new and had no track record, nobody knew if I knew what I was talking about. It was OK though, so they paid their fines, and licensed some new Oracle stuff as part of the settlement.

The next year rolled around and Oracle came back again to do an audit, and once again found massive numbers of violations and the company was once again facing large amounts of penalties to get back in compliance. Apparently the previous process wasn't as transparent as they expected, either the Oracle rep was misleading the company or was generally incompetent, I don't know since I wasn't involved in those talks.

Once again I strongly urged the company to migrate to Standard Edition to slash licensing costs, this time they listened to me. It took a few weeks to get all of the environments migrated over, including a full weekend of my time migrating production doing all sorts of unsupported things to get it done(value adds for you) to minimize downtime (while you can go from Oracle SE to EE without downtime typically you can't do it the other way around). Went the extra mile to establish a standby DB server with log replication and consistent database backups(because you can't run RMAN against a standby DB at least you couldn't on 10GR2 at the time), all of it worked great, and we (as expected) slashed our Oracle licensing fees.

Of course I didn't have to do that, I could of sat by and watched them pay up in fees again(several hundred thousand dollars in total). But I did do it, I did go to them and say I'm willing to work my ass off for several weeks to do this to save you money. Many people I've come across I don't think have the dedication to volunteer for such an effort, they'll of course do it if asked, but frequently won't push hard so they can work more. What did I get out of it? I suppose more than anything a sense of accomplishment and pride. I certainly didn't get any monetary rewards from the company. I didn't get to re-allocate that portion of the budget towards things we were in very desperate need for.

The only frustrating part of the whole situation was when we licensed Oracle EE originally the optimal CPU configuration at the time was the fastest dual core CPUs you could get. So we ordered a HP DL 380G5 I think it was with dual proc dual core CPUs. Given the system was marked as compatible with 4 core systems I figured it would be an easy switch when or if we went to Standard edition (which charges per socket not per core, a fact I had to correct Oracle's own reps on more than one occasion). But when the time came it turned out that we had to replace the motherboard on the HP system because the particular part number we had was not compatible with quad core. It took lots of support calls and HP reps insisting that our system was compatible before someone dug further into the details and found out it was not. But we got the board and CPUs replaced and still of course came out way ahead.

When I come up with solutions it's not half assed. You may have a problem and ask me and I may have an immediate solution for your problem, but it's not because I just read about it on slashdot that morning. My solutions are heavily thought out over a period of months or years (usually years), and it's not obvious to people that I work with (or for, often enough) how much thought actually went into a particular solution regardless of the amount of time that elapsed since you posed the question to me. I love technology and I am always on the hunt for what I consider best of breed in whatever industry that the product is in. I'm not afraid to get my hands dirty, I'm not afraid to stand by my decisions in the event I make a mistake, and I really like to operate in an environment of trust.

Would it surprise you that I led an effort to launch an e-commerce site on top of VMware GSX back in early 2004 so my company's customer would be satisfied? How many of you were running production facing VMware servers back then? Were they doing credit card transactions? I only did it because the company's software failed to install properly during a system upgrade, and in order to keep the customer happy we decided to build them their own stand alone cluster, went from 0 to fully functional and tested in about 96 hours, most of that time was NOT sleeping.

And before you ask, NO I am not one of those people who is going to go suggest an open source solution for every problem on the planet just because it's free. I use open source where I believe it adds value regardless of the cost, and use commercial, closed platforms (whether it's VMware or even Oracle) where I believe they can add value. Don't equate creative solutions with using free software across the board. That's just as stupid as using a closed source ecosystem for all of your IT infrastructure.You won't catch me trying to replace your active directory server with a Samba+LDAP system. You could catch me trying to do that - 10 years ago -. I'm long passed all that.

I can only speak for myself, but let me do my job and you won't be disappointed. I'm not afraid to say I am one of those people who can do some pretty amazing things given the right resources, if your on linkedin you can check the recommendations on my profile for some examples.

So, round about, thanks Chuck that was a good read. Getting all of this written down really makes me feel a bit better too.

Filed under: Uncategorized Comments Off