Diggin' technology every day


More inefficient storage

TechOps Guy: Nate

Another random thought, got woken up this morning and wound up checking what's new on SPC-1, and a couple weeks ago the Chinese company Huawei posted results for their Oceanspace 8100 8-node storage system. This system seems to be similar to the likes of HDS USP/VSP, IBM SVC in that it has the ability to virtualize other storage systems behind it. The system is powered by 32 quad core processors or 128 CPU cores.

The thing that caught my eye is in every SPC-1 disclosure is the paragraph

Unused Storage Ratio: Total Unused Capacity (XXX GB) divided by Physical
Storage Capacity (XXX GB) and may not exceed 45%.

So what is Huawei's Unused storage ratio? - 44.77%

I wonder how hard it was for them to get under the 45% limit, I bet they were probably at 55-60% and had to yank a bunch of drives out or something to decrease their ratio.

From their full disclosure document it appears their tested system has roughly 261TB of unused storage on it. That's pretty bad, 3PAR F400 has a mere 75GB of unused capacity (0.14%) by contrast. The bigger T800 has roughly 21TB of unused capacity (15%).

One would think, that for Huawei, they would be better off using 146GB disks instead of the 300GB, 450GB and 600GB disks (another question is what is the point in mismatched disks for this test, maybe they didn't have enough of one drive type which would be odd for a drive array manufacturer - maybe they mixed drive types to drive the unused capacity perhaps after having started with nothing but 600GB disks).

Speaking of drive sizes, one company I know well has a lot of big Oracle databases and are I/O bound more than space bound, so it benefits them to use smaller disk drives, their current array manufacturer no longer offers 146GB disk drives so they are forced to pay quite a bit more for the bigger disks.

Lots of IOPS to be sure, 300,000 of them (260 IOPS per drive) and 320GB of cache (see note below!), but certainly seems that you could do this a better way..

Looking deeper into the full disclosure documents(Appendix C page 64) for the Huawei system reveals this little gem

The creatlun command creates a LUN with a capacity 1,716,606 MiB. The -p 0 parameter, in the creatlun command sets the read cache policy as no prefetch and the -m 0 parameter sets the write cache policy as write cache with no mirroring.

So they seem to be effectively disabling the read cache and disabling cache mirroring making all cache a write back cache that is not protected? I would imagine they ran the test and found their read cache ineffective so disabled it and devoted it to write cache and re-ran the test.

Submitting results without mirrored cache seems, well misleading to say the least. Glad there is full disclosure!

The approximate cost of the Huawei system seems to be about $2.2 million according to the google exchange rate.

While I am here, what is it with 8 node storage systems? What is magical about that number? I've seen a bunch of different ones both SAN and NAS that top out at eight. Not 10? not 6? Seems a strange coincidence, and has always bugged me for some reason.


HP serious about blade networking

TechOps Guy: Nate

I was doing my rounds and noticed that HP launched a new blade for the Xeon 6500/7500 processor( I don't yet see news breaking of this on The Reg so I beat them for once!), the BL620c G7, and they have another blade the BL680c G7,  is a double wide solution, which to me looks like nothing more than a pair of 620c G7s stacked together and using the backplane to link the systems together, IBM does something similar on their Bladecenter to connect a memory expansion blade onto their HX5 blade.

But what really caught my eye more than anything else is how much networking HP is including on their latest blades, whether it is the BL685c G7, or these two newer systems.

  • BL685c G7 & BL620c G7 both include 4 x 10GbE Flexfabric ports on board (no need to use expansion ports) - that is up to 16 FlexNICs per server - with three expansion slots you can get a max of 10x10GbE ports per server (or 40 FlexNICs per server)
  • BL680c G7 has 6 x 10GbE Flexfabric ports on board providing up to 24 FlexNICs per server - with seven expansion slots you can get a max of 20x10GbE ports per server (or 80 FlexNICs per server)

Side note: Flex Fabric is HP's term referring to CNA.

Looking at the stock networking from Cisco, Dell, and IBM

  • Cisco - their site is complex as usual but from what I can make out their B230M1 blade has 2x10Gbps CNAs
  • Dell and IBM are stuck in 1GbE land, with IBM providing 2x1GbE on their HX5 and Dell providing 4x1GbE on their M910

What is even nicer about the extra NICs on the HP side, at least on the BL685c G7 and I presume the BL620c G7 is that because they are full height, the connections from the extra 2x10GbE ports on the blade feed into the same slots on the backplane, meaning with a single pair of 10GbE modules on the chassis you can get full 4x10GbE per server (8 full height blades per chassis), normally if you would put extra NICs on the expansion ports, those ports are wired to different slots in the back needing additional networking components in those slots.

You might be asking yourself, what if you don't have 10GbE and you only have 1GbE networking? Well first off - upgrade, 10GbE is dirt cheap now there is absolutely no excuse for getting these new higher end blade systems and trying to run them off 1GbE. You're only hurting yourself by attempting it. But in the worst case you really don't know what your doing and you happen to get these HP blades with 10GbE on them and want to connect them to 1GbE switches -- well you can, they are backwards compatible with 1GbE switches. Either with their various 1GbE modules, or the 10GbE pass through module supporting both SFP and SFP+ optics.

So there you have it, 4x10GbE ports per blade standard, if it was me I would take 1 port from each network ASIC, and assign FlexNICs for VM traffic, and take the other port from each ASIC and enable jumbo frames for things like Vmotion, Fault tolerance, iSCSI, NFS etc traffic. I'm sure the cost of adding the extra dual port card is trivial when integrated onto the board, and HP is smart enough to recognize that!

Having more FlexNICs on board means you can use those expansion slots for other things, such as Fusion I/O accelerators, or maybe Infiniband or native Fibre channel connectivity. Having more FlexNICs on board also allows for greater flexibility in network configuration of course, take for example the Citrix Netscaler VPX, which, last I checked required essentially dedicated network ports in vSphere in order to work.

Myself I'm still not sold on the CNA concept at this point. I'm perfectly happy to run a couple FC switches per chassis, and a few extra cables to run to the storage system.

Filed under: Networking 1 Comment

ISilon gets taken out by EMC

TechOps Guy: Nate

Looks like EMC did it after all, buying Isilon for $2.25 Billion. Probably the biggest tech deal for the Seattle area for quite some time.

I haven't paid too much attention to Isilon recently but it does seem like they have a nice product for the scale out media space, lots of big files and high throughput. Isilon, along with Panasas seem to be unique in tightly integrating the storage controller with the NAS side, while other solutions were purely gateway approaches of one sort or another.

So who's next?


Tagged as: No Comments

Red Hat jacks up RHEL pricing

TechOps Guy: Nate

I didn't think they would do this, but Red Hat, along with RHEL 6 introduced some pretty dramatic price hikes.

They seem to have done away with the "unlimited socket" licensing and have at least two tiers - two socket and four socket.

What used to cost $2,499 for Red Hat Advanced Server Premium for unlimited sockets, now costs $6,498 for four sockets, a 260% increase.

That is very Oracle-esque, maybe even worse than Oracle, the biggest hikes I recall Oracle doing was in the 30-50% range. Wonder if there will be any push back from customers.

They don't seem to mention socket licensing beyond 4 sockets.

Filed under: linux, News 4 Comments

RHEL 6 Launched

TechOps Guy: Nate

I didn't even notice it, as The Register put it, it was a very quiet launch. While I have been using Debian on my home systems for more than twelve years now, I do much prefer to use Red Hat Enterprise at work.

And RHEL 6 looks like a pretty decent upgrade

  • Significantly improved power management (aka lower cpu usage for idle VMs) - hello higher consolidation ratios
  • Hot add CPU and memory (wish there was hot remove - if there is I don't see it mentioned)
  • 85% increase in number of packages in the distribution - yay, maybe there will be a lot less things I will have to compile on my own

Sorry I still can't help but laugh at the scalability claims

Red Hat Enterprise Linux 6 has been designed to deliver performance and scalability without sacrificing data integrity. It allows scaling to 4,096 CPUs and 64 terabytes of RAM, providing a solid foundation for supporting upcoming generations of hardware.

It is interesting that the max file system size for the ext4 file system is the same as ext3 - 16TB. Seems kind of dinky.

XFS goes to 100TB which also seems small, maybe just "tested" limits, I would expect XFS to scale far higher than that given it's SGI heritage. The XFS documentation says for 64-bit Linux you can go to 18 Exabytes, which I think is just as crazy as Red Hat's CPU claims but as long as you can safely do a few hundred TB that is more than enough for these days I think.

I can't imagine anyone committing a petabyte or more to a single file system for a good long while at least.

I'll let others play with KVM until at least RHEL 7, until then it's VMware for me.

Filed under: linux, News 3 Comments

10% Tale of two search engines

TechOps Guy: Nate

Saw! an! article! today! and! thought! of! a! somewhat! sad! situation,! at! least! for! those! at! Yahoo!

Not long ago, Google announced they would be giving every employee in the company a 10% raise starting January 2011. One super bad ass engineer is apparently going to get a $3.5M retention bonus to not go to the competition. Lucky for him perhaps that Google is based in California and non competes are not enforceable in California.

Now Yahoo! has announced somewhat of the opposite, no raises, in fact they are going to give the axe to 10% of their employees.

It's too bad that Yahoo! lost it's way so long ago. There was a really good blog post about what went wrong with Yahoo! Going back  more than a decade, really interesting insight into the company.

Tagged as: , No Comments

Extreme VMware

TechOps Guy: Nate

So I was browsing some of the headlines of the companies I follow during lunch and came across this article (seems available on many outlets), which I thought was cool.

I've known VMware has been a very big happy user of Extreme Networks gear for a good long time now though I wasn't aware of anything that was public about it, at least until today. It really makes me feel good that despite VMware's partnerships with EMC and NetApp that include Cisco networking gear, at the end of the day they chose not to run Cisco for their own business.

But going beyond even that it makes me feel good that politics didn't win out here, obviously the people running the network have a preference, and they were either able to fight, or didn't have to fight to get what they wanted. Given VMware is a big company and given their big relationship with Cisco I would kind of think that Cisco would try to muscle their way in. Many times they can succeed depending on the management at the client company, but fortunately for the likes of VMware they did not.

SYDNEY, November 12. Extreme Networks, Inc., (Nasdaq: EXTR) today announced that VMware, the global leader in virtualisation and cloud infrastructure, has deployed its innovative enterprise, data centre and Metro Ethernet networking solutions.

VMware’s network features over 50,000 Ethernet ports that deliver connectivity to its engineering lab and supports the IT infrastructure team for its converged voice implementation.

Extreme Networks met VMware’s demanding requirements for highly resilient and scalable network connectivity. Today, VMware’s thousands of employees across multiple campuses are served by Extreme Networks’ leading Ethernet switching solutions featuring 10 Gigabit Ethernet, Gigabit Ethernet and Fast Ethernet, all powered by the ExtremeXOS® modular operating system.


“We required a robust, feature rich and energy efficient network to handle our data, virtualised applications and converged voice, and we achieved this through a trusted vendor like Extreme Networks, as they help it to achieve maximum availability so that we can drive continuous development,” said Drew Kramer, senior director of technical operations and R&D for VMware. “Working with Extreme Networks, from its high performance products to its knowledgeable and dedicated staff, has resulted in a world class infrastructure.”

Nice to see technology win out for once instead of back room deals which often end up screwing the customer over in the long run.

Since I'm here I guess I should mention the release of the X460 series of switches which came out a week or two ago, intended to replace the now 4-year old X450 series(both "A" and "E"). Notable differences & improvements include:

  • Dual hot swap internal power supplies
  • User swappable fan tray
  • Long distance stacking over 10GbE - up to 40 kilometers
  • Clear-Flow now available when the switches are stacked (prior hardware switches could not be stacked to use Clear-Flow
  • Stacking module is now optional (X450 it was built in)
  • Standard license is Edge license (X450A was Advanced Edge) - still software upgradable all the way to Core license (BGP etc). My favorite protocol ESRP requires Advanced Edge and not Core licensing.
  • Hardware support for IPFIX, which they say is complimentary to sFlow
  • Lifetime hardware warranty with advanced hardware replacement (X450E had lifetime, X450A did not)
  • Layer 3 Virtual Switching (yay!) - I first used this functionality on the Black Diamond 10808 back in 2005, it's really neat.

The X460 seems to be aimed at the mid to upper range of GbE switches, with the X480 being the high end offering.


New NetApp boxes

TechOps Guy: Nate

So it looks like NetApp launched some beefy new systems yesterday, though I got to say if I was a customer of theirs I would feel kind of cheated on the 3200 series systems since they have stuck to dual core processors, when quad core has been available forever. In the "world of Intel" in my eyes there's no excuse to release anything that's not at least quad core unless your trying to squeeze your customers for every last bit (which I'm sure they are...).

Companies like NetApp could take a hint from someone like Citrix, who has a few Netscaler load balancers that they software rate limit the throughput but give you the same hardware as the higher end boxes. So take the 17500 model rated for 20Gbps, you can software upgrade that to more than double the throughput to 50Gbps. But the point isn't the increased throughput via the software upgrade. The point is having the extra CPU horsepower on the smaller end box so that you can enable more CPU intensive features without incurring a noticeable performance hit because you have so much headroom on the system CPU wise.

NetApp introduced compression as one of their new features(I think it's new, maybe wrong). That is of course likely to be a fairly CPU intensive operation. If they had quad or hex core CPUs in there, you could do a lot more, even if they limited your IOPS or throughput to X amount. Maybe they don't have a good way of artificially rate limiting.

But even without rate limiting, it costs them a trivial amount of money to put quad core processors, they just want to squeeze their customers.

Even 3PAR put quad core processors in their F400 system more than a year ago. This is despite the Intel CPUs not doing much work on the 3PAR side, most of the work is done by their Gen3 ASIC. But they realize it's a trivial cost to put in the beefier processor so they do it.

Their new 6200 series controllers do have quad core processors, among other improvements I'm sure. The previous 6000 series was quad socket. (in case your wondering where I'm getting these processors stats from it's from the SPEC disclosure)

NetApp was fast to post both SPEC SFS results for their 3200 and 6200 series, as well as SPC-1E results for their 3200.

All in all very impressive results for SPEC SFS, very efficient results for SPC-1, both heavily assisted by 1TB of their flash cache. Interestingly enough at least on the SPC-1 side since full cost disclosures are there, the cost per usable TB and cost per IOP still doesn't match that of the F400 (which has many more drives, and running RAID 1+0, and more than a year old so would consider the F400 at a great disadvantage but still wins out). SPC-1E isn't a full SPC-1 test though, it's more about power efficiency than raw performance. So time will tell if they do a "regular" SPC-1 test, their SPC-1E IOPS is about the same as their 3170, and the 3270 has much faster CPUs so I'd think it's pretty safe to say that the controllers have capacity to go beyond 68,000 IOPS.

Nice upgrade for their customers in any case.


Filed under: Storage No Comments

Next Gen Opterons — to 20 cores and beyond?

TechOps Guy: Nate

So I came across this a short time ago, but The Register has a lot more useful information here.

From AMD

The server products (“Interlagos” and “Valencia”) will first begin production in Q2 2011, and we expect to launch them in Q3 2011. [This includes the Opteron 6100 socket compatible 16-core Opteron 6200]


Since Bulldozer is designed to fit into the same power/thermal envelope as our current AMD Opteron™ 6100/4100 series processors we obviously have some new power tricks up our sleeve.  One of these is the new CC6 power state, which powers down an entire module when it is idle. That is just one of the new power innovations that you’ll see with Bulldozer-based processors.


We have disclosed that we would include AMD Turbo CORE technology in the past, so this should not be a surprise to anyone. But what is news is the uplift – up to 500MHz with all cores fully utilized. Today’s implementations of boost technology can push up the clock speed of a couple of cores when the others are idle, but with our new version of Turbo CORE you’ll see full core boost, meaning an extra 500MHz across all 16 threads for most workloads.


We are anticipating about a 50% increase in memory throughput with our new “Bulldozer” integrated memory controller.

From The register

Newell showed off the top-end "Terramar" Opteron, which will have up to 20 of a next-generation Bulldozer cores in a single processor socket, representing a 25 percent boost in cores from the top-end Interlagos parts, and maybe a 35 to 40 per cent boost in performance if the performance curve stays the same as the jump from twelve-core "Magny-Cours" Opteron 6100s to the Interlagos chips.


That said, AMD is spoiling for a fight about chip design in a way that it hasn't been since the mid-2000s.


with Intel working on its future "Sandy Bridge" and "Ivy Bridge" Xeon processors for servers, and facing an architecture shift in the two-socket space in 2011 that AMD just suffered through in 2010.

Didn't Intel just go through an architecture shift in the two socket space last year with the Xeon 5500s and their integrated memory controller? And they are shifting architectures again so soon? Granted I haven't really looked into what these new Intel things have to offer.

I suppose my only question is, will VMware come up with yet another licensing level to go beyond 12 cores per socket? It's kind of suspicious that both vSphere Advanced and Enterprise plus are called out at a limit of 12 cores per socket.

Tagged as: , , No Comments

The cool kids are using it

TechOps Guy: Nate

I just came across this video, which is animated, involves a PHP web developer ranting to a psychologist about how stupid the entire Ruby movement is. It's really funny.

I remember being in a similar situation a few years ago, the company had a Java application which drove almost all of the revenue of the company(90%+), and a perl application that they acquired from a ~2 person company and were busy trying to re-write it in Java.

Enter stage left: Ruby. At that point (sometime in 2006/2007), I honestly don't think I had ever heard of Ruby before. But a bunch of the developers really seemed to like it, specifically the whole Ruby on Rails thing. We ran it on top of Apache with fastcgi. It really didn't scale well at all (for fairly obvious reasons that are documented everywhere online). As time went on the company lost more and more interest in the Java applications and wanted to do everything in Ruby. It was cool (for them). Fortunately scalability was never an issue for this company since they had no traffic. At their peak they had four web servers, that on average peaked out at about 30-35% CPU.

It was a headache for me because of all of the modules they wanted to install on the system, and I was not about to use "gem install" to install those modules(that is the "ruby way" I won't install directly from CPAN either BTW), I wanted proper version controlled RPMs. So I built them, for the five different operating platforms we supported at the time (CentOS 4 32/64bit, CentOS 5 32/64bit Fedora Core 4 32-bit -- we were in transition to CentOS 5 32/64-bit). Looking back at my cfengine configuration file there was a total of 108 packages I built while I was there to support them, and it wasn't a quick task to do that.

Then add to the fact that they were running on top of Oracle (which is a fine database IMO), mainly because that was what they had already running with their Java app. But using Oracle wasn't the issue -- the issue was their Oracle database driver didn't support bind variables. If you have spent time with Oracle you know this is a bad thing. We used a hack which involved setting a per-session environment variable in the database to force bind variables to be enabled, this was OK most of the time, but it did cause major issues for a few months when a bad query got into the system, caused the execution plans to get out of whack and massive latch contention. The fastest way to recover the system was to restart Oracle. The developers, and my boss at the time were convinced it was a bug in Oracle. I was convinced it was not because I had seen latch contention in action several times in the past. After a lot of debugging the app and the database in consultation with our DBA consultants they figured out what the problem was -- bad queries being issued from the app. Oracle was doing exactly what they told it to do, even if it means causing a big outage. Latch contention is one of the performance limits of Oracle that you cannot solve by adding more hardware. It seems like it could be at first because the results of it are throughput drops to the floor, and CPUs go to 100% usage instantly.

At one point to try to improve performance and get rid of memory leaks I migrated the Ruby apps from fastcgi to mod_fcgid. Which had a built in ability to automatically restart it's threads after they had served X number of requests. This worked out great, really helped improve operations. I don't recall if it had any real impact on performance but because the memory leaks were no longer a concern that was one less thing to worry about.

Then one day we got in some shiny new HP DL380 G5s with dual proc quad core processors with either 8 or 16GB of memory. Very powerful, very nice servers for the time. So what was the first thing I tried? I wanted to try out 64-bit, be able to take better advantage of the larger amount of memory. So I compiled our Ruby modules for 64-bit, installed a 64-bit CentOS 5.2 I think it was at the time(other production web servers were running CentOS 5.2 32-bit), installed 64-bit Ruby etc. Launched the apps, from a functional perspective they worked fine. But from a practical perspective it was worthless. I enabled the web server in production and it immediately started gagging on it's own blood, load shot through the roof, requests were slow as hell. So I disabled it, and things returned to normal. Tried that a few more times and ended up giving up -- went back to 32-bit. The 32-bit system could handle 10x the traffic of the 64-bit system. Never found out what the issue was before I left the company.

From an operational perspective, my own personal preference for web apps is to run Java. I'm used to running Tomcat myself but really the container matters less to me. I like war files, it makes deployment so simple. And in the Weblogic world I liked ear files (I suspect it's not weblogic specific it's just the only place I've ever used ear files). One archive file that has everything you need built into it. Any extra modules etc are all there. I don't have to go compile anything, install a JVM, install a container and drop a single file to run the application. OK maybe some applications have a few config files (one I used to manage had literally several hundred XML config files -- poor design of course).

Maybe it's not cool anymore to run Java I don't know. But seeing this video reminded me of those days when I did have to support Ruby on production and pre-production systems, it wasn't fun, or cool.

Tagged as: , , 2 Comments