It’s been talked about for quite a while, I think Google was probably the first to widely deploy a battery with their servers, removing the need for larger batteries at the data center or rack level.
Next came Microsoft in what I consider at least to be a better design(more efficient at least), with Google’s apparently using AC power to the servers (though the pictures could well be outdated, who knows what they use now). Microsoft took the approach of rack level UPSs and DC power to the servers.
I was at a Data Center Dynamics conference a couple years back where a presenter talked about a similar topic though he didn’t use batteries, it was more along the lines of big capacitors(that had the risk of exploding no less).
Anyways I was wondering along and came across this, which seems really new. It goes beyond the notion that most power events last only two seconds and gives a server an internal battery capacity of anywhere from 30 seconds to 7 minutes depending on sizing and load.
It looks like a really innovative design and it’s nice to see a commercial product in this space being brought to market. I’m sure you can get similar things from the depths of the bigger players if your placing absolutely massive orders of servers, but for more normal folks I’m not aware of a similar technology being available.
These can be implemented in 1+1+1 (2 AC modules + 1 UPS Module), 1+2 (1 AC + 2 UPS @ 2000W) or 2+2 (2 AC + 2 UPS @ 2000W) configurations.
It does not appear to be a integrated PSU+Battery, but rather a battery module that fits along side a PSU, in place of what otherwise could be another PSU.
You may have issues running these battery units in 3rd party data centers, I don’t see any integration for Emergency Power Off (EPO), some facilities are picky about that kind of thing. I can imagine the look on some uninformed tech’s face when they hit the EPO switch, the lights go out but hundreds or thousands of servers keep humming along. That would be a funny sight to see.
While I’m here I guess I should mention the FatTwin systems that they released a few weeks ago, equally innovative compared to the competition in the space at least. Sort of puts the HP SL-series to shame, really. I don’t think you’d want to run mission critical stuff on this gear, but for the market it’s aimed at which is HPC, web farms, hadoop etc they look efficient, flexible and very dense, quite a nice step up from their previous Twin systems.
It’s been many years since I used Super Micro, I suppose the thing they have traditionally lacked more than anything else in my experience (which again isn’t recent maybe this is fixed), is better fault detection and reporting of memory errors. Along the lines of HP’s Advanced ECC, or IBM’s Chipkill (the damn thing was made for NASA what more do you need !) .
I recall some of the newer Intel chips have something similar in the newer chipsets, though the HP and IBM stuff is more CPU agnostic(e.g. supports AMD 🙂 ). I don’t know how the new Intel memory protection measures up to Advanced ECC / Chipkill. Note I didn’t mention Dell – because Dell has no such technology either (they too rely on the newer Intel chips to provide that similar function for their Intel boxes at least).
The other aspect is when a memory error is reported on an HP system for example (at least one of the better ones 300-series and above) – typically a little LED lights up next to the socket having errors, along with perhaps even a more advanced diagnostics panel on the system before you even open it up to show which socket has issues. Since memory errors were far and away the #1 issue I had when I had Super micro systems, these features became sorely missed very quickly. Another issue was remote management, but they have addressed this to some extent in their newer KVM management modules (now that I think about it the server that powers this blog is a somewhat recent Supermicro with KVM management – but from a company/work/professional perspective it’s been a while since I used them).
What no talk of the P10000 SSD box from 3par today. 🙂
Personally, if I was in a situation where i had to push the EPO button, i doubt that 1000’s of servers still running would even be noticed in the panic of whatever is happening. 🙂
Comment by Justin Brodley — July 31, 2012 @ 8:45 am
Hey Justin!
Yeah no comment on the P10k SSD thing yet – looking for the real numbers. I saw the article on the Reg yesterday of course, and there is still no results posted on SPC’s website. I pinged the HP storage blogger and he was looking into it more last night, I see he posted something this morning on it but light on details. Although really I can’t see anyone ever buying a P10k with nothing but SSDs it’s just, well you know crazy. I would like to see SPC-1 results (perhaps one of those endurance tests) with Adaptive optimization and SSDs.
They may not post actual results, I suppose they just assume since 512 SSDs can drive X amount of I/O to the controllers, and they know they are not “spindle bound”, but rather controller bound I suppose it’s a safe bet that it would produce similar results (though with much lower latency I imagine). Another Q I would have for them is would this be in RAID 1 or RAID 5, I would hope that they could run the system in RAID 5 being that they aren’t spindle bound for an all SSD config and get the same or better results as RAID 1 for spinning rust..
Comment by Nate — July 31, 2012 @ 10:26 am