TechOpsGuys.com Diggin' technology every day

April 30, 2010

Violate Electrical specs for more servers?

Filed under: Datacenter,General,Random Thought — Tags: — Nate @ 8:46 pm

As usual on big blog posts I often literally go back and re-read the post about 60 times and think about what I wrote.

Well I was reading my most recent post about Datacenter Dynamics specifically the stranded power section and the datacenter operators of hyperscale facilities wanting to draw every watt they can off the circuits to maximize efficiency and I got to thinking..

Would they go so far as to violate electrical specs by drawing more than 80% of the power for a particular circuit? I mean in theory at least if they construct the components properly they can probably do it fairly safely. I learned a few years ago from someone, that the spec in question is NEC Section 384-16(c). Which I think in part reads:

The NEC requires the branch circuit computed load for conductor sizing to be sized at 125% of the continuous load, plus the noncontinuous load (100%).

Which equates to 80% utilization. If you know your power usage levels that well, and your loads etc, do you think such hyperscale facilities would run at 85%? 90%? 95% of circuit load? Really with all of the other extreme measures being taken to maximize efficiency I wouldn’t put it past them. They’re going so far as to design special motherboards and have specific components down to the VRMs to lower power usage. I can see them investing in higher grade electrical gear allowing them to safely operate at higher circuit draws, especially when you take into account power capping as well. Afterall, if your spending the effort to shave literally single digit watt usage off your systems that extra 20% capacity on the circuit has to be very tempting to use.

I remember a few years ago doing a load test on one of the aforementioned lower quality power strips(they weren’t cheap, but the company’s QA wasn’t up to par), it was a 30A PDU. And I loaded it up with a bunch of systems, and walked away for a couple minutes and came back and was shocked to see the meter reporting 32A was being drawn. I immediately yanked some of the power cords out to get it back under 30A. After talking with the manufacturer (or maybe it was another manufacturer I don’t recall), they said that was not unexpected, the breaker has some sort of internal timer that will trip based on the amount of excess load on the circuit, so if your drawing 30A it probably won’t trip for a while, if your drawing 32A then it may trip after a few minutes, if you try to draw 40A it will likely trip immediately(I’m guessing here).

April 29, 2010

Datacenter Dynamics

Filed under: Datacenter,Events — Nate @ 7:09 pm

For the past couple of years the folks behind the Datacenter Dynamics conferences have been hounding me to fork over the $500 fee to attend their conference. I’ve looked at it, and it really didn’t seem aimed at people like me, aimed more for people who build/design/manage data centers. I mostly use co-location space. While data center design is somewhat interesting to me, at least the leading edge technology it’s not something I work with.

So a couple days ago a friend of mine offered to get me in for free so I took him up on his offer. Get away from the office, and the conference is about one mile from my apartment.

The keynote was somewhat interesting, given by a Distinguished Engineer from Microsoft. I suppose more than anything I thought some things he had to say were interesting to note. I’ll skip the obvious stuff, he had a couple of less obvious things to say –

Let OEMs innovate

One thing he says MS does is they have a dedicated team of people that instrument subsets of their applications and servers and gather performance data(if you know me you probably know I collect a lot of stats myself everything from apps, to OS, to network, load balancers, power strips, storage etc).

They take this data and build usage profiles for their applications and come up with as generic as possible yet still being specific on certain areas for their server designs:

  • X amount of CPU capacity
  • X amount of memory
  • X amount of disk
  • X amount of I/O
  • Power envelope at a per-rack and per-container basis
  • Operational temperature and humidity levels the systems will operate in

He raised the point if you get too specific you tie the hands of the OEMs and they can’t get creative. He mentioned that on a few occasions they sent out RFPs and gotten back very different designs from different OEMs. He says they use 3 different manufacturers (two I know are SGI/Rackable and Dell, don’t know the third). They apparently aren’t big enough to deal with more OEMs (a strange statement I thought) so they only work with a few.

Now I think for most organizations this really isn’t possible, as getting this sort of precise information isn’t easy, especially from the application level.

They seem to aim for operating the servers in up to mid 90 degree temperatures. Which is pretty extreme but these days not too uncommon among the hyper scale companies. I saw a resume of a MS operations guy recently that claimed he used monitoring software to watch over 800,000 servers.

They emphasized purpose built servers, eliminating components that are not needed to reduce costs and power usage, use low power processors regardless of application. Emphasis on performance/watt/TCO$ I think is what he said.

Stranded Power

Also emphasized eliminating stranded power, use every watt that is available to you, stranded power is very expensive. To achieve this they leverage power capping in the servers to get more servers per rack because they know what their usage profile is they cap their servers at a certain level. I know HP has this technology I assume others do too though I haven’t looked. One thing that was confusing to me when quizzing the HP folks on it was that it was server or chassis(in the case of blades) level. To me that doesn’t seem very efficient, I would expect it to at least be rack/circuit level. I mean if the intelligence is already in the servers you should be able to aggregate that intelligence outside the individual servers to operate more as a collective, to gain even more efficiencies. You could potentially extend the concept to the container level as well.

Idle servers still draw significant power

In my own experience measuring power levels at the PDU/CDU level over the past 6 years I have seen that typical power usage fluctuates at most 10-15% from idle to peak(application peak not really server peak). This guy from MS says that even with the most advanced “idle” states the latest generation CPUs offer it only reduces overall power usage to about 50% of peak. So there still seems to be significant room for improving power efficiencies when a system is idle. Perhaps that is why Google is funding some project to this end.

MS’s Generation 4 data centers

I’m sure others have read about them, but you know me, I really don’t pay much attention to what MS does, really have not in a decade or more so this was news to me..

He covered their evolution of data center design, topping out at what he called Gen3 data centers in Chicago and Ireland, container based.

Their generation 4 data centers are also container based but appear to be significantly more advanced from an infrastructure perspective than current data center containers. If you have silverlight you can watch a video on it here, it’s the same video shown at the conference.

I won’t go into big details I’m sure you can find them online but the basics is it is designed to be able to operate in seemingly almost any environment, using just basic outside air for cooling. If it gets too hot then a water system kicks in to cool the incoming air(an example was lowering ~105 degree air outside to ~95 degree air inside cool enough for the servers). If it gets too cold then an air re-circulation system kicks in and circulates a portion of the exhaust from the servers back to the front of the servers to mix with incoming cold air. If it gets too humid it does something else to compensate(forgot what).

They haven’t deployed it at any scale yet so don’t have hard data on things yet but have enough research to move forward with the project.

I’ll tell you what I’m glad I don’t deal with server hardware anymore, these new data centers are running so hot I’d want them to run the cooling just for me, I can’t work in those temperatures I’d die.

Server level UPS

You may of heard people like Google talking about using batteries in their servers. I never understood this concept myself, I can probably understand not using big centralized UPSs in your data center, but I would expect the logical move would be to rack level UPSs. Ones that would take AC input and output DC power to the servers directly.

One of the main complaints about normal UPSs as far as efficiency goes is the double conversion that goes on, incoming AC converts to DC to the batteries, then back to AC to go to the racks. I think this is mainly because DC power isn’t very good for long distances. But there are already some designs on the market from companies like SGI (aka Rackable) for rack level power distribution(e.g. no power supplies in the servers). This is the Cloudrack product, something I’ve come to really like a lot since I first got wind of it in 2008.

If you have such a system, and I imagine Google does something similar, I don’t understand why they’d put batteries in the server instead of integrate them into the rack power distribution, but whatever it’s their choice.

I attended a breakout session which talked about this topic presented by someone from the Electric Power Research Institute. The speaker got pretty technical into the electrical terminology beyond what I could understand but I got some of the basic concepts.

The most interesting claim he made was that 90% of electrical disruptions last less than two seconds. This here was enough for me to understand why people are looking to server-level batteries instead of big centralized systems.

They did pretty extensive testing with system components and power disruptions and had some interesting results, honestly can’t really recite them they were pretty technical, involved measuring power disruptions in the number of cycles(each cycle is 1/60th of a second), comparing the duration of the disruption with the magnitude of it(in their case voltage sags). So they measured the “breaking” point of equipment, what sort of disruptions can they sustain, he said for the most part power supplies are rated to handle 4 cycles of disruption, or 4/60ths of a second likely without noticeable impact. Beyond that in most cases equipment won’t get damaged but it will shut off/reboot.

He also brought up that in their surveys it was very rare that power sags went below 60% of total voltage. Which made me think about some older lower quality UPSs I used to have in combination with auto switching power supplies. What would happen when the UPS went to battery it caused a voltage spike, which convinced the auto switching power supplies to switch from 120V to 208V and that immediately tripped their breaker/saftey mechanism because the voltage returned to 120V within who knows how many cycles. I remember when I was ordering special high grade power supplies I had to request that they be hard set to 110V to avoid this. Eventually I replaced the UPSs with better ones and haven’t had the problem since.

But it got me thinking could the same thing happen? I mean pretty much all servers now have auto switching power supplies, and one of this organizations tests involved 208V/220V power sags dropping as low as 60% of regular voltage. Could that convince the power supply it’s time to go to 120V ? I didn’t get to ask..

They constructed a power supply board with some special capacitors I believe they were(they were certainly not batteries but they may of had another technical term that escapes me), which can store enough energy to ride that 2 second window where 90% of power problems occur in. He talked about other components that would assist in the charging of this capacitor, since it was an order of magnitude larger than the regular ones in the system there had to be special safeguards in place to prevent it from exploding or something when it charged up. Again lots of electrical stuff beyond my areas of expertise(and interest really!).

They demonstrated it the lab, and it worked well. He said there really isn’t anything like it on the market, and this is purely a lab thing they don’t plan to make anything to sell. The power supply manufacturers are able to do this, but they are waiting to see if a market develops, if there will be demand in the general computing space to make such technology available.

I’d be willing to bet for the most part people will not use server level batteries. In my opinion it doesn’t make sense unless your operating at really high levels of scale, I believe for the most part people want and need more of a buffer, more of a margin of error to be able to correct things that might fail, having only a few seconds to respond really isn’t enough time. At certain scale it becomes less important but most places aren’t at that scale. I’d guesstimate that scale doesn’t kick in until you have high hundreds or thousands of systems, preferably at diverse facilities. And most of those systems have to be in like configurations running similar/same applications in a highly redundant configuration. Until you get there I think it will still be very popular to stick with redundant power feeds and redundant UPSs. The costs to recover from such a power outage are greater than what you gain in efficiency(in my opinion).

In case it’s not obvious I feel the same way about flywheel UPSs

Airflow Optimization

Another breakout session was about airflow optimization. The one interesting thing I learned here is that you can measure how efficient your airflow is by comparing the temperature of the intake of the AC units vs the output of them. If the difference is small (sub 10 degrees) then there is too much air mixture going on. If you have a 100% efficient cooling system it will be 28-30 degrees difference. He also mentioned that there isn’t much point in completely isolating thermal zones from each other unless your running high densities(at least 8kW per rack). If your doing less the time for ROI is too long for it to be effective.

He mentioned one customer that they worked with, they spent $45k on sensors(350 of them I think), and a bunch of other stuff to optimize the airflow for their 5,000 square foot facility. While they could of saved more by keeping up to 5 of their CRAH(?) units turned off(AC units), the customer in the end wanted to keep them all on they were not comfortable operating with temps in the mid 70s. But despite having the ACs on, with the airflow optimization they were able to save ~5.7% in power which resulted in something like $33k in annual savings. And now they have a good process and equipment to be able to repeat this procedure on their own if they want in other locations.

Other stuff

There was a couple other breakout sessions I went to, one from some sort of Wall Street research firm, which really didn’t seem interesting, he mostly talked about what his investors are interested in(stupid things like the number of users on Twitter and Facebook – if you know me you know I really hate these social sites)

Then I can’t leave this blog without mentioning the most pointless breakout session ever, sorry no offense to the person who put it on, it was about Developing Cloud Services. I really wasn’t expecting much but what I got was nothing. He spent something like 30 minutes talking about how you need infrastructure, power, network, support etc. I talked with another attendee who agreed this guy had no idea what he was talking about he was just rambling on about infrastructure(he works for a data center company). I can understand talking about that stuff but everything he covered was so incredibly obvious it was a pointless waste of time.

Shameless Plug

If your building out a datacenter with traditional rack mount systems and can choose what PDU you use I suggest you check out Servertech stuff, I really like them for their flexibility but they also on their higher end models offer integrated environmental sensors, if you have 2 PDUs per rack as an example you can have up to 4 sensors(two in front, two in back yes you want sensors in back). I really love the insight that massive instrumentation gives. And it’s nice to have this feature built into the PDU so you don’t need extra equipment.

Servertech also has software that integrates with server operating systems which can pro-actively gracefully shut down systems(in any order you want) as the temperature rises in the event of a cooling failure, they call it Smart Load Shedding.

I think they may also be unique in the industry in having a solution that can measure power usage on a per-outlet basis. And they claim something like accuracy within 1% or something. I recall asking another PDU manufacturer a few years ago on the prospects of measuring power on a per-outlet basis and they said it was too expensive, it would add ~$50 per outlet monitored. I don’t know what Servertech charges for these new CDUs (as they call them), but I’m sure they don’t charge $50/outlet more.

There may be other solutions in the industry that are similar, I haven’t found a reason to move away from Servertech yet. My previous PDU vendor had some pretty severe quality issues(~30% failure rate), so of course not all solutions are equal.

Conclusion

The conclusion to this is it pretty much lived up to my expectations. I would not pay for this event to attend it unless I was building/designing data centers. The sales guys from Datacenter Dynamics tried to convince me that it would be of value to me, and on their site they list a wide range of professions that can benefit from it. Maybe it’s true, I learned a few things, but really nothing that will cause me to adjust strategy in the near future.

April 26, 2010

40GbE for $1,000 per port

Filed under: Networking,News — Tags: , — Nate @ 8:32 am

It seems it wasn’t too long ago that 10GbE broke the $1,000/port price barrier. Now it seems we have reached it with 40GbE as well, from my own personal favorite networking company Extreme Networks, announced today the availability of an expansion module for the X650 and X480 stackable switches to include 40GbE support. Top of rack line rate 10GbE just got more feasable.

LAS VEGAS, NV, Apr 26, 2010 (MARKETWIRE via COMTEX News Network) — Extreme Networks, Inc. (NASDAQ: EXTR) today announced highly scalable 40 Gigabit Ethernet (GbE) network solutions at Interop Las Vegas. The VIM3-40G4X adds four 40 GbE connections to the award-winning Summit(R) X650 Top-of-Rack stackable switches for $3,995, or less than $1,000 per port. The new module is fully compatible with the existing Summit X650 and Summit X480 stackable switches, preserving customers’ investments while providing a smooth upgrade to greatly increased scalability of both virtualized and non-virtualized data centers.

[..]

Utilizing Ixia’s IxYukon and IxNetwork test solutions, Extreme Networks demonstrates wire-speed 40Gbps performance and can process 60 million packets per second (120Mpps full duplex) of data center traffic between ToR and EoR switches.

April 19, 2010

Arista ignites networks with groundbreaking 10GbE performance

Filed under: Networking,News — Tags: , — Nate @ 8:53 am

In a word: Wow

Just read an article from our friends at The Register on a new 384-port chassis 10GbE switch that Arista is launching. From a hardware perspective the numbers are just draw dropping.

A base Arista 7500 costs $140,000, and a fully configured machine with all 384 ports and other bells and whistles runs to $460,800, or $1,200 per port. This machine will draw 5,072 watts of juice and take up a little more than quarter of a rack.

Compare this to a Cisco Nexus 7010 setup to get 384 wirespeed ports and deliver the same 5.76 Bpps of L3 throughput, and you need to get 18 of the units at a cost of $13.7m. Such a configuration will draw 160 kilowatts and take up 378 rack units of space – nine full racks. Arista can do the 384 ports in 1/34th the space and 1/30th the price.

I love the innovation that comes from these smaller players, really inspiring.

April 14, 2010

First SPC-1 Numbers with automagic storage tiering

Filed under: News,Storage — Tags: , , , — Nate @ 8:38 am

IBM recently announced that they are adding an “easy tier” of storage to some of their storage systems. This seems to be their form of what I have been calling automagic storage tiering. They are doing it at the sub LUN level in 1GB increments. And they recently posted SPC-1 numbers for this new system, finally someone posted numbers.

Configuration of the system included:

  • 1 IBM DS8700
  • 96 1TB SATA drives
  • 16 146GB SSDs
  • Total ~100TB raw space
  • 256GB Cache

Performance of the system:

  • 32,998 IOPS
  • 34.1 TB Usable space

Cost of the system:

  • $1.58 Million for the system
  • $47.92 per SPC-1 IOP
  • $46,545 per usable TB

Now I’m sure the system is fairly power efficient given that it only has 96 spindles on it, but I don’t think that justifies the price tag. Just take a look at this 3PAR F400 which posted results almost a year ago:

  • 384 disks, 4 controllers, 24GB data cache
  • 93,050 SPC-1 IOPS
  • 26.4 TB Usable space (~56TB raw)
  • $548k for the system (I’m sure prices have come down since)
  • $5.89 per SPC-1 IOP
  • $20,757 per usable TB

The system used 146GB disks, today the 450GB disks seem priced very reasonably, I would opt for those instead and get the extra space for not much of a premium.

Take a 3PAR F400 with 130 450GB 15k RPM disks, that would be about 26TB of usable space with RAID 1+0 (the tested configuration above is 1+0). That would give about 33.8% of the performance of the above 384-disk system, so say 31,487 SPC-1 IOPS, very close to the IBM system and I bet the price of the 3PAR would be close to half of the $548k above (taking into account the controllers in any system are a good chunk of the cost). 3PAR has near linear scalability making extrapolations like this possible and accurate. And you can sleep well at night knowing you can triple your space/performance online without service disruption.

Note of course you can equip a 3PAR system with SSD and use automagic storage tiering as well, they call it Adaptive Optimization, if you really wanted to. The 3PAR system moves data around in 128MB increments by contrast.

It seems the cost of the SSDs and the massive amount of cache IBM dedicated to the system more than offset the benefits of using lower cost nearline SATA disks in the system. If you do that, what’s the point of it then?

So consider me not impressed with the first results of automagic storage tiering. I expected significantly more out of it. Maybe it’s IBM specific, maybe not, time will tell.

April 9, 2010

Found a use for the cloud

Filed under: News,Virtualization — Tags: — Nate @ 1:42 pm

Another interesting article on Datacenter Knowledge and mentioned the U.S. Government’s use of the Terremark cloud, I recall reading about it briefly when it first launched but seeing the numbers again made me do another double take.

”One of the most troubling aspects about the data centers is that in a lot of these cases, we’re finding that server utilization is actually around seven percent,” Federal Chief Information Officer Vivek Kundra said

[..]

Yes, you read that correctly. A government agency was going to spend $600,000 to set up a blog.

[..]

The GSA previously paid $2.35 million in annual costs for USA.gov, including $2 million for hardware refreshes and software re-licensing and $350,000 in personnel costs, compared to the $650,000 annual cost to host the site with Terremark.

For $650k/yr I bet the site runs on only a few servers(dozen or less) and has less than a TB of total disk space.

April 8, 2010

What can you accomplish in two microseconds?

Filed under: Networking — Nate @ 4:30 pm

An interesting post on the Datacenter Knowledge site about the growth in low latency data centers, the two things that were pretting shocking to me at the end were:

“I still find it amazing,” said McPartland. “A blink of an eye is 300 milliseconds. That’s an eternity in this business.”

How much of an eternity: “You can do a heck of a lot in 2 microseconds,” said Kaplan.

Interesting the latency requirements these fast stock traders are looking for, reminded me of a network upgrade the NYSE did deploying some Juniper stuff a while back as reported by The Register:

With the NewYork Stock Exchange down on Wall Street being about ten miles away from the data center in New Jersey, the delay between Wall Street and the systems behind the NYSE is about 105 microseconds. This is not a big deal for some trading companies, but means millions of dollars for others.

[..]

NYSE Technologies, which is the part of the company that actually hooks people into the NYSE and Euronext exchanges, has rolled out a market data system based on the Vantage 8500 switches. The system offers latencies per core switch in the range of 25 microseconds for one million messages per second on messages that are 200 bytes in size.

The Vantage 8500 switch seems pretty scalable, claiming to have non blocking scalability of 10GbE for up to 3,400 servers, announced last year.

Arista Networks somewhat recently launched an initiative aimed at this market segment as well.

Since the Juniper announcement, Force10 announced that the NYSE has chosen their gear for the next generation data centers at the NYSE, the Juniper switching gear so far hasn’t looked all that great compared to the competition, so I’d be curious how the deployment of Force10 stuff relates to the earlier deployment of Juniper stuff:

SAN JOSE, Calif., November 9, 2009 – Force10 Networks, Inc., the global technology leader that data center, service provider and enterprise customers rely on when the network is their business, today announced that the NYSE Euronext has selected its high-performance 10 Gigabit Ethernet (10 GbE) core and access switches to power the management network in their next-generation data centers in the greater New Jersey and London metro areas.

Force10 of course has been one of the early innovators and leaders in 10GbE port density and raw throughput(at least on paper, I’ve never used their stuff personally though have heard good things). On a related note it wasn’t long ago that they filed for an IPO, I wish them the best, as Force10 really is an innovative company and I’ve admired their technology for several years now.

(how do I remember all of these news articles?)

April 3, 2010

Terremark vCloud Express: Day 1

Filed under: Virtualization — Tags: , , — Nate @ 12:19 pm

You may of read another one of my blog entries “Why I hate the cloud“, I also mentioned how I’ve been hosting my own email/etc for more than a decade in “Lesser of two evils“.

So what’s this about? I still hate the cloud for any sort of large scale deployment, but for micro deployments it can almost make sense. Let me explain my situation:

About 9 years ago the ISP I used to help operate more or less closed shop, I relocated what was left of the customers to my home DSL line (1mbps/1mbps 8 static IPs) on a dedicated little server. My ISP got bought out, then got bought out again and started jacking up the rates(from $20/mo to ~$100/mo + ~$90;/mo for Qwest professional DSL). Hosting at my apartment was convienant but at the same time was a sort of a ball and chain, as it made it very difficult to move. Co-ordinating the telco move and the ISP move with minimal downtime, well let’s just say with DSL that’s about impossible. I managed to mitigate one move in 2001 by temporarily locating my servers at my “normal” company’s network for a few weeks while things got moved.

A few years ago I was also hit with what was a 27 hour power outage(despite being located in a down town metropolitan area, everyone got hit by that storm). Shortly after that I decided longer term a co-location is the best fit for me. So phase one was to virtualize the pair of systems in VMware. I grabbed an older server I had laying around and did that, and ran it for a year, worked great(though the server was really loud).

Then I got another email saying my ISP was bought out yet again, this time the company was going to force me to change my IP addresses, which when your hosting your own DNS can be problematic. So that was the last straw. I found a nice local company to host my server at a reasonable price. The facility wasn’t world class by any stretch, but the world class facilities in the area had little interest in someone wanting to host a single 1U box that averages less than 128kbps of traffic at any given time. But it would do for now.

I run my services on a circa 2004 Dual Xeon system, with 6GB memory, ~160GB of disk on a 3Ware 8006-2 RAID controller(RAID 1). I absolutely didn’t want to go to one of those cheap crap hosting providers where they have massive downtime and no SLAs. I also had absolutely no faith in the earlier generation “UML” “VMs(yes I know Xen and UML aren’t the same but I trust them the same amount – e.g. none). My data and privacy are fairly important to me and I am willing to pay extra to try to maintain it.

So early last year my RAID card told me one of my disks was about to fail and to replace it, so I did, rebuilt the array and off I went again. A few months later the RAID card again told me another disk was about to fail(there are only two disks in this system), so I replaced that disk, rebuilt, and off I went. Then a few months later, the RAID card again said a disk is not behaving right and I should replace it. Three disk replacements in less than a year. Though really it’s been two, I’ve ignored the most recent failing drive for several months now. Media scans return no errors, however RAID integrity checks always fail causing a RAID rebuild(this happens once a week). Support says the disk is suffering from timeouts.  There is no back plane on the system(and thus no hot swap, making disk replacements difficult). Basically I’m getting tired of maintaining hardware.

I looked at the cost of a good quality server with hot swap, remote management, etc, and something that can run ESX, cost is $3-5k. I could go $2-3k and stick to VMware server on top of Debian, a local server manufacturer has their headquarters literally less than a mile from my co-location, so it is tempting to stick with doing it on my own, and if my needs were greater than I would fo sure, cloud does not make sense in most cases in my opinion but in this case it can.

If I try to price out a cloud option that would match that $3-5k server, purely from a CPU/memory perspective the cloud option would be significantly more. But I looked closer and I really don’t need that much capacity for my stuff. My current VMware host runs at ~5-8% cpu usage on average on six year old hardware. I have 6GB of ram but I’m only using 2-3GB at best. Storage is the biggest headache for me right now hosting my own stuff.

So I looked to Terremark who seem to have a decent operation going, for the most part they know what they are doing(still make questionable decisions though I think most of those are not made by the technical teams). I looked to Terremark for a few reasons:

  • Enterprise storage either from 3PAR or EMC (storage is most important for me right now given my current situation)
  • Redundant networking
  • Tier IV facilities (my current facility lacks true redundant power and they did have a power outage last year)
  • Persistent, fiber attached storage, no local storage, no cheap iSCSI, no NFS,  no crap RAID controllers, no need to worry about using APIs and other special tools to access storage it is as if it was local
  • Fairly nice user interface that allows me to self provision VMs, IPs etc

Other things they offer that I don’t care about(for this situation, others they could come in real handy):

  • Built in load balancing via Citrix Netscalers
  • Built in firewalls via Cisco ASAs

So for me, a meager configuration of 1 vCPU, 1.5GB of memory, and 40GB of disk space with a single external static IP is a reasonable cost(pricing is available here):

  • CPU/Memory: $65/mo [+$1,091/mo if I opted for 8-cores and 16GB/ram]
  • Disk space: $10/mo [+$30/mo if I wanted 160GB of disk space]
  • 1 IP address: $7.20/mo
  • 100GB data transfer: $17/mo (bandwidth is cheap at these levels so just picked a round number)
  • Total: $99/mo

Which comes to about the same as what I’m paying for in co-location fees now, if that’s all the costs were I’d sign up in a second, but unfortunately their model has a significant premium on “IP Services”, when ideally what I’d like is just a flat layer 3 connection to the internet. The charge is $7.20/mo for each TCP and UDP port you need opened to your system, so for me:

  • HTTP – $7.20/mo
  • HTTPS – $7.20/mo
  • SMTP – $7.20/mo
  • DNS/TCP – $7.20/mo
  • DNS/UDP – $7.20/mo
  • VPN/UDP – $7.20/mo
  • SSH – $7.20/mo
  • Total: $50/mo

And I’m being conservative here, I could be opening up:

  • POP3
  • POP3 – SSL
  • IMAP4
  • IMAP4 – SSL
  • Identd
  • Total: another $36/mo

But I’m not, for now I’m not. Then you can double all of that for my 2nd system, so assuming I do go forward with deploying the second system my total costs (including those extra ports) is roughly $353/mo (I took out counting a second 100GB/mo of bandwidth). Extrapolate that out three years:

  • First year: $4,236 ($353/mo)
  • First two years: $8,472
  • First three years: $12,708

Compared to doing it on my own:

  • First year: ~$6,200 (with new $5,000 server)
  • First two years: ~$7,400
  • First three years: ~$8,600

And if you really want to see how this cost structure doesn’t scale, let’s take a more apples to apples comparison of CPU/memory of what I’d have in my own server and put it in the cloud:

  • First year – $15,328 [ 8 cores, 16GB ram 160GB disk ]
  • First two years – $30,657
  • First three years – $45,886

As you can see the model falls apart really fast.

So clearly it doesn’t make a lot of sense to do all of that at once, so if I collapse it to only the essential services on the cloud side:

  • First year: $3,420 ($270/mo)
  • First two years: $6,484
  • First three years: $9,727

I could live with that over three years, especially if the system is reliable, and maintains my data integrity. But if they added just one feature for lil ol me, that feature would be a “Forwarding VIP” on their load balancers and say basically just forward everything from this IP to this internal IP. I know their load balancers can do it, it’s just a matter of exposing the functionality. This would dramatically impact the costs:

  • First year: $2,517 ($210/mo)
  • First two years: $5,035
  • First three years: $7,552
  • First four years: $10,070

You can see how the model doesn’t scale, I am talking about 2 vCPUs  worth of power, and 3GB of memory, compared to say at least a 8-12 core physical server and 16GB or more of memory if I did it myself. But again I have no use for that extra capacity if I did it myself so it’d just sit idle, like it does today.

CPU usage is higher than I mentioned above I believe because of a bug in VMware Server 2.0 that causes CPU to “leak” somehow, which results in a steady, linear increase in cpu usage over time. I reported it to the forums, but didn’t get a reply, and don’t care enough to try to engage VMware support, they didn’t help me much with ESX and a support contract, they would do even less for VMware server and no support contract.

I signed up for Terremark’s vCloud Express program a couple of months ago, installed a fresh Debian 5.0 VM, and synchronized my data over to it from one of my existing co-located VMs.

So today I have officially transferred all of my services(except DNS) from one of my two co-located VMs to Terremark, and will run it for a while and see how the costs are, how it performs, reliability etc. My co-location contract is up for renewal in September so I have plenty of time to determine whether or not I want to make the jump, I’m hoping I can make it work, as it will be nice to not have to worry about hardware anymore. An excerpt from that link:

[..] My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’ll also think long and hard, and probably consolidate both of my co-located VMs into a single VM at Terremark if I do go that route, which will save me a lot, I really prefer two VMs, but I don’t think I should be charged double for two, especially when two are going to use roughly the same amount of resources as one. They talk all about “pay for what you use”, when that is not correct, the only portion of their service that is pay for what you use is bandwidth. Everything else is “pay as you provision”. So if you provision 100GB and a 4CPU VM but you never turn it on, well your still going to pay for it.

The model needs significant work, hopefully it will improve in the future, all of these cloud companies are trying to figure out this stuff still. I know some people at Terremark and will pass this along to them to see what they think. Terremark is not alone in this model, I’m not picking on them for any reason other than I use their services. I think in some situations it can make sense. But the use cases are pretty low at this point. You probably know that I wouldn’t sign up and commit to such a service unless I thought it could provide some good value!

Part of the issue may very well be limitations in the hypervisor itself with regards to reporting actual usage, as VMware and others improve their instrumentation of their systems that could improve the cost model for customers signficantly, perhaps doing things like charging based on CPU usage based on a 95% model like we measure bandwidth. And being able to do things like cost capping, where if your resource usage is higher for an extended period the provider can automatically throttle your system(s) to keep your bill lower(at your request of course).

Another idea would be more accurate physical to virtual mapping, where I can provision say 1 physical CPU, and X amount of memory and then provision unlimited VMs inside that one CPU core and memory. Maybe I just need 1:1, or maybe my resource usage is low enough that I can get 5:1 or 10:1, afterall one of the biggest benefits of virtualization is being able to better isolate workloads. Terremark already does this to some degree on their enterprise products, but this model isn’t available for vCloud Express, at least not yet.

You know what surprised me most next to the charges for IP services, was how cheap enterprise storage is for these cloud companies. I mean $10/mo for 40GB of space on a high end storage array? I can go out and buy a pretty nice server to host VMs at a facility of my choosing, but if I want a nice storage array to back it up I’m looking at easily 10s of thousands of dollars. I just would of expected storage to be a bigger piece of the pie when it came to overall costs. When in my case it can be as low as 3-5% of the total cost over a 3 year period.

And despite Terremark listing Intel as a partner, my VM happens to be running on -you guessed it – AMD:

yehat:/var/log# cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 4
model name    : Quad-Core AMD Opteron(tm) Processor 8389
stepping    : 2
cpu MHz        : 2913.037

AMD get’s no respect I tell ya, no respect! 🙂

I really want this to work out.

April 2, 2010

Grid Iron decloaks

Filed under: News,Storage — Tags: , , — Nate @ 10:30 am

Grid Iron Systems seems to have left stealth mode somewhat recently, they are another start up that makes an accelerator appliance that sits in between your storage and your server(s). Kind of what Avere does on the NAS side, Grid Iron does on the SAN side with their “TurboCharger“.

Certainly looks like an interesting product but it appears they make it “safe” by making it cache only reads, I want a SSD system that can cache writes too! (yes I know that wears the SSDs out faster I’m sure, but just do warranty replacement). I look forward to seeing some SPC-1 numbers on how Grid Iron can accelerate systems, at the same time I look forward to SPC-1 numbers on how automatic storage tiering can accelerate systems as well.

I’d also be interested in seeing how Grid Iron can accelerate NetApp systems vs using NetApp’s own read-only PAM (since Grid Iron specifically mentions NetApp in their NAS accelerator, although yes I’m sure they just used NetApp as an example).

April 1, 2010

New IBM blades based on Intel 7500 announced

Filed under: News,Virtualization — Tags: , , , , , — Nate @ 7:46 pm

The Register had the scoop a while back, but apparently today they were officially announced. IBM did some trickery with the new 7500 series Intel Xeons to accomplish two things:

  • Expand the amount of memory available to the system
  • Be able to “connect” two dual socket blades to form a single quad socket system

Pretty creative, though the end result wasn’t quite as impressive as it sounded up front. Their standard blade chassis is 9U and has 14 slots on it.

  • Each blade is dual socket, maximum 16 cores, and 16 DIMMs
  • Each memory extender offers 24 additional DIMMs

So for the chassis as a whole your talking about 7 dual socket systems with 40 DIMMs each. Or 3 quad socket systems with 80 DIMMs each, and 1 dual socket with 40.

Compared to an Opteron 6100 system, which you can get 8 quad socket systems with 48 DIMMs each in a single enclosure(granted such a system has not been announced yet but I am confident it will be).

  • Intel 7500-based system: 112 CPU cures (1.8Ghz), 280 DIMM slots – 9U
  • Opteron 6100-based system: 384 CPU cores (2.2Ghz), 384 DIMM slots – 10U

And the price of the IBM system is even less impressive –

In a base configuration with a single four-core 1.86 GHz E7520 processor and 8 GB of memory, the BladeCenter HX5 blade costs $4,629. With two of the six-core 2 GHz E7540 processors and 64 GB of memory, the HX5 costs $15,095.

They don’t seem to show pricing for the 8 core 7500-based blade, and say there is no pricing or ETA on the arrival of the memory extenders.

They do say this which is interesting (not surprising) –

The HX5 blade cannot support the top-end eight-core Xeon 7500 parts, which have a 130 watt thermal design point, but it has been certified to support the eight-core L7555, which runs at 1.86 GHz, has 24 MB of L3 cache, and is rated at 95 watts.

I only hope AMD has enough manufacturing capacity to keep up with demand, Opteron 6100s will wipe the floor with the Intel chips on price/performance (for the first time in a while).

Powered by WordPress