TechOpsGuys.com Diggin' technology every day

August 23, 2010

HP FlexFabric module launched

Filed under: Datacenter,Networking,Storage,Virtualization — Tags: , , , , — Nate @ 5:03 pm

While they announced it a while back, it seems the HP VirtualConnect FlexFabric Module available for purchase for $18,500 (web price). Pretty impressive technology, Sort of a mix between FCoE and combining a Fibre channel switch and a 10Gbps Flex10 switch into one. The switch has two ports on it that can uplink (apparently) directly fiber channel 2/4/8Gbps. I haven’t read too much into it yet but I assume it can uplink directly to a storage array, unlike the previous Fibre Channel Virtual Connect module which had to be connected to a switch first (due to NPIV).

HP Virtual Connect FlexFabric 10Gb/24-port Modules are the simplest, most flexible way to connect virtualized server blades to data or storage networks. VC FlexFabric modules eliminate up to 95% of network sprawl at the server edge with one device that converges traffic inside enclosures and directly connects to external LANs and SANs. Using Flex-10 technology with Fibre Channel over Ethernet and accelerated iSCSI, these modules converge traffic over high speed 10Gb connections to servers with HP FlexFabric Adapters (HP NC551i or HP NC551m Dual Port FlexFabric 10Gb Converged Network Adapters or HP NC553i 10Gb 2-port FlexFabric Converged Network Adapter). Each redundant pair of Virtual Connect FlexFabric modules provide 8 adjustable connections ( six Ethernet and two Fibre Channel, or six Ethernet and 2 iSCSI or eight Ethernet) to dual port10Gb FlexFabric Adapters. VC FlexFabric modules avoid the confusion of traditional and other converged network solutions by eliminating the need for multiple Ethernet and Fibre Channel switches, extension modules, cables and software licenses. Also, Virtual Connect wire-once connection management is built-in enabling server adds, moves and replacement in minutes instead of days or weeks.

[..]

  • 16 x 10Gb Ethernet downlinks to server blade NICs and FlexFabric Adapters
  • Each 10Gb downlink supports up to 3 FlexNICs and 1 FlexHBA or 4 FlexNICs
  • Each FlexHBA can be configured to transport either Fiber Channel over Ethernet/CEE or Accelerated iSCSI protocol.
  • Each FlexNIC and FlexHBA is recognized by the server as a PCI-e physical function device with adjustable speeds from 100Mb to 10Gb in 100Mb increments when connected to a HP NC553i 10Gb 2-port FlexFabric Converged Network Adapter or any Flex-10 NIC and from 1Gb to 10Gb in 100Mb increments when connected to a NC551i Dual Port FlexFabric 10Gb Converged Network Adapter or NC551m Dual Port FlexFabric 10Gb Converged Network Adapter
  • 4 SFP+ external uplink ports configurable as either 10Gb Ethernet or 2/4/8Gb auto-negotiating Fibre Channel connections to external LAN or SAN switches
  • 4 SFP+ external uplink ports configurable as 1/10Gb auto-negotiating Ethernet connected to external LAN switches
  • 8 x 10Gb SR, LR fiber and copper SFP+ uplink ports (4 ports also support 10Gb LRM fiber SFP+)
  • Extended list of direct attach copper cable connections supported
  • 2 x 10Gb shared internal cross connects for redundancy and stacking
  • HBA aggregation on FC configured uplink ports using ANSI T11 standards-based N_Port ID Virtualization (NPIV) technology
  • Allows up to 255 virtual machines running on the same physical server to access separate storage resources
  • Up to 128 VLANs supported per Shared Uplink Set
  • Low latency (1.2 µs Ethernet ports and 1.7 µs Enet/Fibre Channel ports) throughput provides switch-like performance.
  • Line Rate, full-duplex 240Gbps bridging fabric
  • MTU up to 9216 Bytes – Jumbo Frames
  • Configurable up to 8192 MAC addresses and 1000 IGMP groups
  • VLAN Tagging, Pass-Thru and Link Aggregation supported on all uplinks
  • Stack multiple Virtual Connect FlexFabric modules with other VC FlexFabric, VC Flex-10 or VC Ethernet Modules across up to 4 BladeSystem enclosures allowing any server Ethernet port to connect to any Ethernet uplink

Management

  • Pre-configure server I/O configurations prior to server installation for easy deployment
  • Move, add, or change server network connections on the fly without LAN and SAN administrator involvement
  • Supported by Virtual Connect Enterprise Manager (VCEM) v6.2 and higher for centralized connection and workload management for hundreds of Virtual Connect domains. Learn more at: www.hp.com/go/vcem
  • Integrated Virtual Connect Manager included with every module, providing out-of-the-box, secure HTTP and scriptable CLI interfaces for individual Virtual Connect domain configuration and management.
  • Configuration and setup consistent with VC Flex-10 and VC Fibre Channel Modules
  • Monitoring and management via industry standard SNMP v.1 and v.2 Role-based security for network and server administration with LDAP compatibility
  • Port error and Rx/Tx data statistics displayed via CLI
  • Port Mirroring on any uplink provides network troubleshooting support with Network Analyzers
  • IGMP Snooping optimizes network traffic and reduces bandwidth for multicast applications such as streaming applications
  • Recognizes and directs Server-Side VLAN tags
  • Transparent device to the LAN Manager and SAN Manager
  • Provisioned storage resource is associated directly to a specific virtual machine – even if the virtual server is re-allocated within the BladeSystem
  • Server-side NPIV removes storage management constraint of a single physical HBA on a server blade Does not add to SAN switch domains or require traditional SAN management
  • Centralized configuration of boot from iSCSI or Fibre Channel network storage via Virtual Connect Manager GUI and CLI
  • Remotely update Virtual Connect firmware on multiple modules using Virtual Connect Support Utility 1.5.0

Options

  • Virtual Connect Enterprise Manager (VCEM), provides a central console to manage network connections and workload mobility for thousands of servers across the datacenter
  • Optional HP 10Gb SFP+ SR, LR, and LRM modules and 10Gb SFP+ Copper cables in 0.5m, 1m, 3m, 5m, and 7m lengths
  • Optional HP 8 Gb SFP+ and 4 Gb SFP optical transceivers
  • Supports all Ethernet NICs and Converged Network adapters for BladeSystem c-Class server blades: HP NC551i 10Gb FlexFabric Converged Network Adapters, HP NC551m 10Gb FlexFabric Converged Network Adapters, 1/10Gb Server NICs including LOM and Mezzanine card options and the latest 10Gb KR NICs
  • Supports use with other VC modules within the same enclosure (VC Flex-10 Ethernet Module, VC 1/10Gb Ethernet Module, VC 4 and 8 Gb Fibre Channel Modules).

So in effect this allows you to cut down on the number of switches per chassis from four to two, which can save quite a bit. HP had a cool graphic showing the amount of cables that are saved even against Cisco UCS but I can’t seem to find it at the moment.

The most recently announced G7 blade servers have the new FlexFabric technology built in(which is also backwards compatible with Flex10).

VCEM seems pretty scalable

Built on the Virtual Connect architecture integrated into every BladeSystem c-Class enclosure, VCEM provides a central console to administer network address assignments, perform group-based configuration management and to rapidly deployment, movement and failover of server connections for 250 Virtual Connect domains (up to 1,000 BladeSystem enclosures and 16,000 blade servers).

With each enclosure consuming roughly 5kW with low voltage memory and power capping, 1,000 enclosures should consume roughly 5 Megawatts? From what I see “experts” say it costs roughly ~$18 million per megawatt for a data center, so one VCEM system can manage a $90 million data center, that’s pretty bad ass. I can’t think of who would need so many blades..

If I were building a new system today I would probably get this new module, but have to think hard about sticking to regular fibre channel module to allow the technology to bake a bit more for storage.

The module is built based on Qlogic technology.

HP to the rescue

Filed under: Datacenter,Events,News,Storage — Tags: , , , , — Nate @ 6:03 am

Knock knock.. HP is kicking down your back door 3PAR..

Well that’s more like it, HP offered $1.6 Billion to acquire 3PAR this morning topping Dell’s offer by 33%. Perhaps the 3cV solution can finally be fully backed by HP. More info from The Register here. And more info on what this could mean to HP and 3PAR products from the same source here.

3PAR’s website is having serious issues, this obviously has spawned a ton of interest in the company, I get intermittent blank pages and connection refused messages.

I didn’t wake my rep up for this one.

The 3cV solution was announced about three years ago –

Elements of the 3cV solution include:

  • 3PAR InServ® Storage Servers—highly virtualized, tiered-storage arrays built for utility computing. Organizations creating virtualized IT infrastructures for workload consolidation use InServ arrays to reduce the cost of allocated storage capacity, storage administration, and SAN infrastructure.
  • HP BladeSystem c-Class Server Blades—the leading blade server infrastructure on the market for datacenters of all sizes. HP BladeSystem c-Class server blades minimize energy and space requirements and increase administrative productivity through advantages in I/O virtualization, powering and cooling, and manageability.
  • VMware vSphere—the leading virtualization platform for industry-standard servers. VMware vSphere helps customers reduce capital and operating expenses, improve agility, ensure business continuity, strengthen security, and go green.

While I could not find the image that depicts the 3cV solution(not sure how long it’s been gone for), here is more info on it for posterity.

The Advantages of 3cV
3cV offers combined benefits that enable customers to manage and scale their server and storage environments simply, allowing them to halve server, storage and operational costs while lowering the environmental impact of the datacenter.

  • Reduces storage and server costs by 50%—The inherently modular architectures of the HP BladeSystem c-Class and the 3PAR InServ Storage Server—coupled with the increased utilization provided by VMware Infrastructure and 3PAR Thin Provisioning—allow 3cV customers to do more with less capital expenditure. As a result, customers are able to reduce overall storage and server costs by 50% or more. High levels of availability and disaster recovery can also be affordably extended to more applications through VMware Infrastructure and 3PAR thin copy technologies.
  • Cuts operational costs by 50% and increases business agility—With 3cV, customers are able to provision and change server and storage resources on demand. By using VMware Infrastructure’s capabilities for rapid server provisioning and the dynamic optimization provided by VMware VMotion and Distributed Resource Scheduler (DRS), HP Virtual Connect and Insight Control management software, and 3PAR Rapid Provisioning and Dynamic Optimization, customers are able to provision and re-provision physical servers, virtual hosts, and virtual arrays with tailored storage services in a matter of minutes, not days. These same technologies also improve operational simplicity, allowing overall server and storage administrative efficiency to increase by 3x or more.
  • Lowers environmental impact—With 3cV, customers are able to cut floor space and power requirements dramatically. Server floor space is minimized through server consolidation enabled by VMware Infrastructure (up to 70% savings) and HP BladeSystem density (up to 50% savings). Additional server power requirements are cut by 30% or more through the unique virtual power management capabilities of HP Thermal Logic technology. Storage floor space is reduced by the 3PAR InServ Storage Server, which delivers twice the capacity per floor tile as compared to alternatives. In addition, 3PAR thin technologies, Fast RAID 5, and wide striping allow customers to power and cool as much as 75% less disk capacity for a given project without sacrificing performance.
  • Delivers security through virtualization, not dedicated hardware silos—Whereas traditional datacenter architectures force tradeoffs between high resource utilization and the need for secure segregation of application resources for disparate user groups, 3cV resolves these competing needs through advanced virtualization. For instance, just as VMware Infrastructure securely isolates virtual machines on shared severs, 3PAR Virtual Domains provides secure “virtual arrays” for private, autonomous storage provisioning from a single, massively-parallel InServ Storage Server.

Though due to the recent stack wars it’s been hard for 3PAR to partner with HP to promote this solution since I’m sure HP would rather push their own full stack. Well hopefully now they can. The best of both worlds technology wise can come together.

More details from 3PAR’s VMware products site.

From HP’s offer letter

We propose to increase our offer to acquire all of 3PAR outstanding common stock to $24.00 per share in cash. This offer represents a 33.3% premium to Dell’s offer price and is a “Superior Proposal” as defined in your merger agreement with Dell. HP’s proposal is not subject to any financing contingency. HP’s Board of Directors has approved this proposal, which is not subject to any additional internal approvals. If approved by your Board of Directors, we expect the transaction would close by the end of the calendar year.

In addition to the compelling value offered by our proposal, there are unparalleled strategic benefits to be gained by combining these two organizations. HP is uniquely positioned to capitalize on 3PAR’s next-generation storage technology by utilizing our global reach and superior routes to market to deliver 3PAR’s products to customers around the world. Together, we will accelerate our ability to offer unmatched levels of performance, efficiency and scalability to customers deploying cloud or scale-out environments, helping drive new growth for both companies.
As a Silicon Valley-based company, we share 3PAR’s passion for innovation.
[..]

We understand that you will first need to communicate this proposal and your Board’s determinations to Dell, but we are prepared to execute the merger agreement immediately following your termination of the Dell merger agreement.

Music to my ears.

[tangent — begin]

My father worked for HP in the early days back when they were even more innovative than they are today, he recalled their first $50M revenue year. He retired from HP in the early 90s after something like 25-30 years.

I attended my freshman year at Palo Alto Senior High school, and one of my classmates/friends (actually I don’t think I shared any classes with him now that I think about it) was Ben Hewlett, grandson of one of the founders of HP. Along with a couple other friends Ryan and Jon played a bunch of RPGs (I think the main one was Twilight 2000, something one of my other friends Brian introduced me to in 8th grade).

I remember asking Ben one day why he took Japanese as his second language course when it was significantly more difficult than Spanish(which was the easy route, probably still is?) I don’t think I’ll ever forget his answer. He said “because my father says it’s the business language of the future..”

How times have changed.. Now it seems everyone is busy teaching their children Chinese. I’m happy knowing English, and a touch of bash and perl.

I never managed to keep in touch with my friends from Palo Alto, after one short year there I moved back to Thailand for two more years of high school there.

[tangent — end]

HP could do some cool stuff with 3PAR, they have much better technology overall, I have no doubt HP has their eyes on their HDS partnership and the possibility of replacing their XP line with 3PAR technology in the future has got to be pretty enticing. HDS hasn’t done a whole lot recently, and I read not long ago that regardless what HP says, they don’t have much (if any) input into the HDS product line.

The HP USP-V OEM relationship is with Hitachi SSG. The Sun USP-V reseller deal was struck with HDS. Mikkelsen said: “HP became a USP-V OEM in 2004 when the USP-V was already done. HP had no input to the design and, despite what they say, very little input since.” HP has been a Hitachi OEM since 1999.

Another interesting tidbit of information from the same article:

It [HDS] cannot explain why it created the USP-V – because it didn’t, Hitachi SSG did, in Japan, and its deepest thinking and reasons for doing so are literally lost in translation.

The loss of HP as an OEM customer of HDS, so soon after losing Sun as an OEM customer would be a really serious blow to HDS(one person I know claimed it accounts for ~50% of their business), whom seems to have a difficult time selling stuff in western countries, I’ve read it’s mostly because of their culture. Similarly it seems Fujitsu has issues selling stuff in the U.S. at least, they seem to have some good storage products but not much attention is paid to them outside of Asia(and maybe Europe). Will HDS end up like Fujtisu as a result of HP buying 3PAR? Not right away for sure, but longer term they stand to lose a ton of market share in my opinion.

And with the USP getting a little stale (rumor has it they are near to announcing a technology refresh for it), it would be good timing for HP to get 3PAR, to cash in on the upgrade cycle by getting customers to go with the T class arrays instead of the updated USP whenever possible.

I read on an HP blog earlier in the year an interesting comment –

The 3PAR is drastically less expensive than an XP, but is an active/active concurrent design, can scale up to 8 clustered controllers, highly virtualized, customers can self-install, self-maintain, and requires no professional services. Its on par with the XP in terms of raw performance, but has the ease of use of the EVA. Like the XP, the 3PAR can be carved up into virtual domains so that service providers or multi-tenant arrays can have delegated administration.

I still think 3PAR is worth more, and should stay independent, but given the current situation would much rather have them in the arms of HP than Dell.

Obviously those analysts that said Dell paid too much for 3PAR were wrong, and didn’t understand the value of the 3PAR technology. HP does otherwise they wouldn’t be offering 33% more cash.

After the collapse of so many of 3PAR’s NAS partners over the past couple of years, the possibility of having Ibrix available again for a longer term solution is pretty good. Dell bought Exanet’s IP earlier in the year. LSI owns Onstor, HP bought Polyserve and Ibrix. Really just about no “open” NAS players left. Isilon seems to be among the biggest NAS players left but of course their technology is tightly integrated into their disk drive systems, same with Panasas.

Maybe that recent legal investigation into the board at 3PAR had some merit after all.

Dell should take their $billion and shove it in Pillar’s(or was it Compellent ? I forgot) face, so the CEO there can make his dream of being a billion dollar storage company come true, if only for a short time.

I’m not a stock holder or anything, I don’t buy stocks(or bonds).

August 15, 2010

Lowest power dual socket server ever

Filed under: Datacenter,General — Tags: , , — Nate @ 12:20 pm

This was posted a couple of weeks ago but I was on vacation at the time and didn’t notice it until a few days ago.

It talks about the latest 4000-series low power chips from AMD running in a dual socket system from ZT Systems.

The numbers are pretty startling. At peak load they measure the power draw at only 126 watts for the system as a whole:

  • Dual processor 6-core Opteron 4164 EE (1.8Ghz per core)
  • 16GB memory (4x4GB DDR3-1333)
  • 128GB SSD

From the blog:

[..] There are four major enhancements to the AMD Opteronâ„¢ 4000 Series platform which significantly lower server power consumption:

  1. The AMD Opteron™ 4100 EE Series of processors are the lowest power AMD Opteron processors ever. These processors are rated at 32W ACP, which is 20% lower than AMD’s previous generation 2400 EE Series processors.
  2. AMD Opteronâ„¢ 4100 Series processors support 1.35V DDR3 memory, enabling lower server power consumption at load.
  3. The AMD Opteronâ„¢ 4000 Series platform uses low-power chipsets. The SR5650 has a maximum TDP of only 13 watts.
  4. AMD Opteron™ 4100 Series processors include new AMD-P power management features, including C1E. C1E is a feature that helps reduce the power consumption of the AMD Opteron™ 4100 Series processor’s integrated memory controller and HyperTransport™ technology links.

[..]
The two lowest power Intel Xeon processor-based servers consume 28% more and 34% more power than the ZT Systems 1253Ra Datacenter Server[..]

Pretty amazing that you can get a dual processor, 12 core(total) system running at less power than some CPUs out there consume by themselves.

I’m sure it will run even at even lower power with rack level DC power and cooling.

August 7, 2010

Container trailer park near Seattle

Filed under: Datacenter — Nate @ 11:35 am

That was quick, not too long ago I was thinking about the prospects of having a container data center trailer park of sorts, to date all of the container data centers I have seen talked about have all been pretty specialised, hosted by the company that bought the containers, not in a more common, neutral, co-location style.

But that seems to be changing I came across a data center web site near Seattle today that is opening very soon, and one of it’s offerings is Ready-to-Go containerized data center space.

Space is available on the 92 acre campus for containerized data center use. With the infrastructure, including fiber and power, already in place users can deploy quickly in a highly efficient, scalable, customized environment.

Cool.

May 28, 2010

That’s not a knife…

Filed under: Datacenter,Storage — Tags: , , — Nate @ 9:10 pm

There’s been a lot of talk (no thanks to Cisco/EMC) about infrastructure blocks recently. Myself I never (and still don’t) like the concept. I think it makes sense in the SMB world where you have very limited IT staff and they need a canned, integrated solution. Companies like HP and IBM have been selling these sorts of mini stacks for years. As for Microsoft I think they have a “Small business” version of their server platform which includes a bunch of things integrated together as well.

I think the concept falls apart at scale though, I’m a strong believer in best of breed technologies, and what is best of breed really depends on the requirements of the organization. I have my own favorites of course for the industries I’ve been working with/in for the past several years but I know they don’t apply to everyone.

I was reading up yesterday on some new containerized data centers that SGI released in their Ice Cube series. The numbers are just staggering.

In their most dense configuration, in 320 square feet of space consuming approximately 1 megawatt of power you can have either:

  • More then 45,000 CPU cores
  • More than 29 Petabytes of storage

In both cases you can get roughly 45kW per rack, while today most legacy data centers top out at between 2-5kW per rack.

Stop and think about that for a minute, think about the space, think about the density. 320 square feet is smaller than even a studio apartment,, though in Japan it may be big enough to house a family of 10-12 (I hear space is tight over there).

How’s that for an infrastructure block? And yes you can stack one on top of another

ICE Cube utilizes an ISO standard commercially available 9.5′ x 8′ x 40′ container. SGI intentionally designed the offering such that the roof of the container is clear of obstruction and fully capable of utilizing its stacking container feature. Because of this, SGI is positioned to supply a compelling density multiplier for future expansion of the data center. If installed in a location without overhead height restriction the 9.5′ x 8′ x 40′ containers in our primary product offering can be stacked up to three-high, thus allowing customers to double or triple the per square foot density of the facility over the already industry-leading density of a single ICE Cube.

All of this made me think of a particular scene from a ’80s movie.

Really makes these other blocks some vendors are talking about sound like toys by comparison doesn’t it.

May 3, 2010

Terremark vCloud Express: First month

Filed under: Datacenter,Virtualization — Tags: , , — Nate @ 3:02 pm

Not much to report, got my first bill for my first “real” month of usage (minus DNS I haven’t gotten round to transferring DNS yet but I do have the ports opened).

$122.20 for the month which included:

  • 1 VM with 1VPU/1.5GB/40GB – $74.88
  • 1 External IP address – $0.00 (which is confusing I thought they charged per IP)
  • TCP/UDP ports – $47.15
  • 1GB of data transferred – $0.17

Kind of funny the one thing that is charged as I use it (the rest being charged as I provision it) I pay less than a quarter for. Obviously I slightly overestimated my bandwidth usage. And I’m sure they round to the nearest GB, as I don’t believe I even transferred 1GB during the month of April.

I suppose the one positive thing from a bandwidth and cost standpoint if I ever wanted to route all of my internet traffic from my cable modem at home through my VM (over VPN) for paranoia or security purposes, I could. I believe Comcast caps bandwidth at ~250GB/mo or something which would be about $42/mo assuming I tapped it out(but believe me my home bandwidth usage is trivial as well).

Hopefully this coming weekend I can get around to assigning a second external IP, mapping it to my same DNS and moving some of my domains over to this cloud instead of keeping them hosted on my co-located server. Just been really busy recently.

May 1, 2010

Data Center trailer parks

Filed under: Datacenter — Nate @ 10:30 am

OK, probably going further out on a limb here but for some reason the idea came to my head and I thought it would be a funny concept.

With all these new things coming up around container based data centers, there still remains a problem that needs to be solved – where do you get the power, cooling and networking.

So I imagined a trailer park of sorts for data centers where companies could drive their container data centers(which can contain well over one thousand systems per container) and plug them in to a network jack and get power, and a water feed.

Data centers of the future may end up just being giant parking lots (above or below ground) with some sort of industrial grade easy to use connectors for plug and play containers. Maybe it goes even further and you are billed on just what you use automatically. A Ethernet jack or perhaps wireless connection at the site and you could authenticate to the facility and provision bandwidth, IP addresses, and plug in and turn on. The system would automatically meter the amount of water you draw, and perhaps even monitor the temperature of the return water feed (those that return it cooler will get charged less). And of course pay per kWH for power. Plus a flat rate fee for parking.

Maybe power companies, water treatment facilities(or other common water provider) and carriers team up to provide some sort of common standard or technique to provide this kind of service.

Then perhaps add in IPv6, I think I’ve read about it having some good IP mobility features, or maybe you just get some sort of BGP feed where you can advertise your own IPs.

Then say some disaster strikes like a hurricane or earthquake, the facility is robust enough to handle it, but maybe the infrastructure around it is destroyed, go pick up your container and take it to another lot.

By the time I got mid way through this post the concept in my mind sounded more feasable than it was when it first came to mind.

April 30, 2010

Violate Electrical specs for more servers?

Filed under: Datacenter,General,Random Thought — Tags: — Nate @ 8:46 pm

As usual on big blog posts I often literally go back and re-read the post about 60 times and think about what I wrote.

Well I was reading my most recent post about Datacenter Dynamics specifically the stranded power section and the datacenter operators of hyperscale facilities wanting to draw every watt they can off the circuits to maximize efficiency and I got to thinking..

Would they go so far as to violate electrical specs by drawing more than 80% of the power for a particular circuit? I mean in theory at least if they construct the components properly they can probably do it fairly safely. I learned a few years ago from someone, that the spec in question is NEC Section 384-16(c). Which I think in part reads:

The NEC requires the branch circuit computed load for conductor sizing to be sized at 125% of the continuous load, plus the noncontinuous load (100%).

Which equates to 80% utilization. If you know your power usage levels that well, and your loads etc, do you think such hyperscale facilities would run at 85%? 90%? 95% of circuit load? Really with all of the other extreme measures being taken to maximize efficiency I wouldn’t put it past them. They’re going so far as to design special motherboards and have specific components down to the VRMs to lower power usage. I can see them investing in higher grade electrical gear allowing them to safely operate at higher circuit draws, especially when you take into account power capping as well. Afterall, if your spending the effort to shave literally single digit watt usage off your systems that extra 20% capacity on the circuit has to be very tempting to use.

I remember a few years ago doing a load test on one of the aforementioned lower quality power strips(they weren’t cheap, but the company’s QA wasn’t up to par), it was a 30A PDU. And I loaded it up with a bunch of systems, and walked away for a couple minutes and came back and was shocked to see the meter reporting 32A was being drawn. I immediately yanked some of the power cords out to get it back under 30A. After talking with the manufacturer (or maybe it was another manufacturer I don’t recall), they said that was not unexpected, the breaker has some sort of internal timer that will trip based on the amount of excess load on the circuit, so if your drawing 30A it probably won’t trip for a while, if your drawing 32A then it may trip after a few minutes, if you try to draw 40A it will likely trip immediately(I’m guessing here).

April 29, 2010

Datacenter Dynamics

Filed under: Datacenter,Events — Nate @ 7:09 pm

For the past couple of years the folks behind the Datacenter Dynamics conferences have been hounding me to fork over the $500 fee to attend their conference. I’ve looked at it, and it really didn’t seem aimed at people like me, aimed more for people who build/design/manage data centers. I mostly use co-location space. While data center design is somewhat interesting to me, at least the leading edge technology it’s not something I work with.

So a couple days ago a friend of mine offered to get me in for free so I took him up on his offer. Get away from the office, and the conference is about one mile from my apartment.

The keynote was somewhat interesting, given by a Distinguished Engineer from Microsoft. I suppose more than anything I thought some things he had to say were interesting to note. I’ll skip the obvious stuff, he had a couple of less obvious things to say –

Let OEMs innovate

One thing he says MS does is they have a dedicated team of people that instrument subsets of their applications and servers and gather performance data(if you know me you probably know I collect a lot of stats myself everything from apps, to OS, to network, load balancers, power strips, storage etc).

They take this data and build usage profiles for their applications and come up with as generic as possible yet still being specific on certain areas for their server designs:

  • X amount of CPU capacity
  • X amount of memory
  • X amount of disk
  • X amount of I/O
  • Power envelope at a per-rack and per-container basis
  • Operational temperature and humidity levels the systems will operate in

He raised the point if you get too specific you tie the hands of the OEMs and they can’t get creative. He mentioned that on a few occasions they sent out RFPs and gotten back very different designs from different OEMs. He says they use 3 different manufacturers (two I know are SGI/Rackable and Dell, don’t know the third). They apparently aren’t big enough to deal with more OEMs (a strange statement I thought) so they only work with a few.

Now I think for most organizations this really isn’t possible, as getting this sort of precise information isn’t easy, especially from the application level.

They seem to aim for operating the servers in up to mid 90 degree temperatures. Which is pretty extreme but these days not too uncommon among the hyper scale companies. I saw a resume of a MS operations guy recently that claimed he used monitoring software to watch over 800,000 servers.

They emphasized purpose built servers, eliminating components that are not needed to reduce costs and power usage, use low power processors regardless of application. Emphasis on performance/watt/TCO$ I think is what he said.

Stranded Power

Also emphasized eliminating stranded power, use every watt that is available to you, stranded power is very expensive. To achieve this they leverage power capping in the servers to get more servers per rack because they know what their usage profile is they cap their servers at a certain level. I know HP has this technology I assume others do too though I haven’t looked. One thing that was confusing to me when quizzing the HP folks on it was that it was server or chassis(in the case of blades) level. To me that doesn’t seem very efficient, I would expect it to at least be rack/circuit level. I mean if the intelligence is already in the servers you should be able to aggregate that intelligence outside the individual servers to operate more as a collective, to gain even more efficiencies. You could potentially extend the concept to the container level as well.

Idle servers still draw significant power

In my own experience measuring power levels at the PDU/CDU level over the past 6 years I have seen that typical power usage fluctuates at most 10-15% from idle to peak(application peak not really server peak). This guy from MS says that even with the most advanced “idle” states the latest generation CPUs offer it only reduces overall power usage to about 50% of peak. So there still seems to be significant room for improving power efficiencies when a system is idle. Perhaps that is why Google is funding some project to this end.

MS’s Generation 4 data centers

I’m sure others have read about them, but you know me, I really don’t pay much attention to what MS does, really have not in a decade or more so this was news to me..

He covered their evolution of data center design, topping out at what he called Gen3 data centers in Chicago and Ireland, container based.

Their generation 4 data centers are also container based but appear to be significantly more advanced from an infrastructure perspective than current data center containers. If you have silverlight you can watch a video on it here, it’s the same video shown at the conference.

I won’t go into big details I’m sure you can find them online but the basics is it is designed to be able to operate in seemingly almost any environment, using just basic outside air for cooling. If it gets too hot then a water system kicks in to cool the incoming air(an example was lowering ~105 degree air outside to ~95 degree air inside cool enough for the servers). If it gets too cold then an air re-circulation system kicks in and circulates a portion of the exhaust from the servers back to the front of the servers to mix with incoming cold air. If it gets too humid it does something else to compensate(forgot what).

They haven’t deployed it at any scale yet so don’t have hard data on things yet but have enough research to move forward with the project.

I’ll tell you what I’m glad I don’t deal with server hardware anymore, these new data centers are running so hot I’d want them to run the cooling just for me, I can’t work in those temperatures I’d die.

Server level UPS

You may of heard people like Google talking about using batteries in their servers. I never understood this concept myself, I can probably understand not using big centralized UPSs in your data center, but I would expect the logical move would be to rack level UPSs. Ones that would take AC input and output DC power to the servers directly.

One of the main complaints about normal UPSs as far as efficiency goes is the double conversion that goes on, incoming AC converts to DC to the batteries, then back to AC to go to the racks. I think this is mainly because DC power isn’t very good for long distances. But there are already some designs on the market from companies like SGI (aka Rackable) for rack level power distribution(e.g. no power supplies in the servers). This is the Cloudrack product, something I’ve come to really like a lot since I first got wind of it in 2008.

If you have such a system, and I imagine Google does something similar, I don’t understand why they’d put batteries in the server instead of integrate them into the rack power distribution, but whatever it’s their choice.

I attended a breakout session which talked about this topic presented by someone from the Electric Power Research Institute. The speaker got pretty technical into the electrical terminology beyond what I could understand but I got some of the basic concepts.

The most interesting claim he made was that 90% of electrical disruptions last less than two seconds. This here was enough for me to understand why people are looking to server-level batteries instead of big centralized systems.

They did pretty extensive testing with system components and power disruptions and had some interesting results, honestly can’t really recite them they were pretty technical, involved measuring power disruptions in the number of cycles(each cycle is 1/60th of a second), comparing the duration of the disruption with the magnitude of it(in their case voltage sags). So they measured the “breaking” point of equipment, what sort of disruptions can they sustain, he said for the most part power supplies are rated to handle 4 cycles of disruption, or 4/60ths of a second likely without noticeable impact. Beyond that in most cases equipment won’t get damaged but it will shut off/reboot.

He also brought up that in their surveys it was very rare that power sags went below 60% of total voltage. Which made me think about some older lower quality UPSs I used to have in combination with auto switching power supplies. What would happen when the UPS went to battery it caused a voltage spike, which convinced the auto switching power supplies to switch from 120V to 208V and that immediately tripped their breaker/saftey mechanism because the voltage returned to 120V within who knows how many cycles. I remember when I was ordering special high grade power supplies I had to request that they be hard set to 110V to avoid this. Eventually I replaced the UPSs with better ones and haven’t had the problem since.

But it got me thinking could the same thing happen? I mean pretty much all servers now have auto switching power supplies, and one of this organizations tests involved 208V/220V power sags dropping as low as 60% of regular voltage. Could that convince the power supply it’s time to go to 120V ? I didn’t get to ask..

They constructed a power supply board with some special capacitors I believe they were(they were certainly not batteries but they may of had another technical term that escapes me), which can store enough energy to ride that 2 second window where 90% of power problems occur in. He talked about other components that would assist in the charging of this capacitor, since it was an order of magnitude larger than the regular ones in the system there had to be special safeguards in place to prevent it from exploding or something when it charged up. Again lots of electrical stuff beyond my areas of expertise(and interest really!).

They demonstrated it the lab, and it worked well. He said there really isn’t anything like it on the market, and this is purely a lab thing they don’t plan to make anything to sell. The power supply manufacturers are able to do this, but they are waiting to see if a market develops, if there will be demand in the general computing space to make such technology available.

I’d be willing to bet for the most part people will not use server level batteries. In my opinion it doesn’t make sense unless your operating at really high levels of scale, I believe for the most part people want and need more of a buffer, more of a margin of error to be able to correct things that might fail, having only a few seconds to respond really isn’t enough time. At certain scale it becomes less important but most places aren’t at that scale. I’d guesstimate that scale doesn’t kick in until you have high hundreds or thousands of systems, preferably at diverse facilities. And most of those systems have to be in like configurations running similar/same applications in a highly redundant configuration. Until you get there I think it will still be very popular to stick with redundant power feeds and redundant UPSs. The costs to recover from such a power outage are greater than what you gain in efficiency(in my opinion).

In case it’s not obvious I feel the same way about flywheel UPSs

Airflow Optimization

Another breakout session was about airflow optimization. The one interesting thing I learned here is that you can measure how efficient your airflow is by comparing the temperature of the intake of the AC units vs the output of them. If the difference is small (sub 10 degrees) then there is too much air mixture going on. If you have a 100% efficient cooling system it will be 28-30 degrees difference. He also mentioned that there isn’t much point in completely isolating thermal zones from each other unless your running high densities(at least 8kW per rack). If your doing less the time for ROI is too long for it to be effective.

He mentioned one customer that they worked with, they spent $45k on sensors(350 of them I think), and a bunch of other stuff to optimize the airflow for their 5,000 square foot facility. While they could of saved more by keeping up to 5 of their CRAH(?) units turned off(AC units), the customer in the end wanted to keep them all on they were not comfortable operating with temps in the mid 70s. But despite having the ACs on, with the airflow optimization they were able to save ~5.7% in power which resulted in something like $33k in annual savings. And now they have a good process and equipment to be able to repeat this procedure on their own if they want in other locations.

Other stuff

There was a couple other breakout sessions I went to, one from some sort of Wall Street research firm, which really didn’t seem interesting, he mostly talked about what his investors are interested in(stupid things like the number of users on Twitter and Facebook – if you know me you know I really hate these social sites)

Then I can’t leave this blog without mentioning the most pointless breakout session ever, sorry no offense to the person who put it on, it was about Developing Cloud Services. I really wasn’t expecting much but what I got was nothing. He spent something like 30 minutes talking about how you need infrastructure, power, network, support etc. I talked with another attendee who agreed this guy had no idea what he was talking about he was just rambling on about infrastructure(he works for a data center company). I can understand talking about that stuff but everything he covered was so incredibly obvious it was a pointless waste of time.

Shameless Plug

If your building out a datacenter with traditional rack mount systems and can choose what PDU you use I suggest you check out Servertech stuff, I really like them for their flexibility but they also on their higher end models offer integrated environmental sensors, if you have 2 PDUs per rack as an example you can have up to 4 sensors(two in front, two in back yes you want sensors in back). I really love the insight that massive instrumentation gives. And it’s nice to have this feature built into the PDU so you don’t need extra equipment.

Servertech also has software that integrates with server operating systems which can pro-actively gracefully shut down systems(in any order you want) as the temperature rises in the event of a cooling failure, they call it Smart Load Shedding.

I think they may also be unique in the industry in having a solution that can measure power usage on a per-outlet basis. And they claim something like accuracy within 1% or something. I recall asking another PDU manufacturer a few years ago on the prospects of measuring power on a per-outlet basis and they said it was too expensive, it would add ~$50 per outlet monitored. I don’t know what Servertech charges for these new CDUs (as they call them), but I’m sure they don’t charge $50/outlet more.

There may be other solutions in the industry that are similar, I haven’t found a reason to move away from Servertech yet. My previous PDU vendor had some pretty severe quality issues(~30% failure rate), so of course not all solutions are equal.

Conclusion

The conclusion to this is it pretty much lived up to my expectations. I would not pay for this event to attend it unless I was building/designing data centers. The sales guys from Datacenter Dynamics tried to convince me that it would be of value to me, and on their site they list a wide range of professions that can benefit from it. Maybe it’s true, I learned a few things, but really nothing that will cause me to adjust strategy in the near future.

« Newer Posts

Powered by WordPress