TechOpsGuys.com Diggin' technology every day

April 3, 2010

Terremark vCloud Express: Day 1

Filed under: Virtualization — Tags: , , — Nate @ 12:19 pm

You may of read another one of my blog entries “Why I hate the cloud“, I also mentioned how I’ve been hosting my own email/etc for more than a decade in “Lesser of two evils“.

So what’s this about? I still hate the cloud for any sort of large scale deployment, but for micro deployments it can almost make sense. Let me explain my situation:

About 9 years ago the ISP I used to help operate more or less closed shop, I relocated what was left of the customers to my home DSL line (1mbps/1mbps 8 static IPs) on a dedicated little server. My ISP got bought out, then got bought out again and started jacking up the rates(from $20/mo to ~$100/mo + ~$90;/mo for Qwest professional DSL). Hosting at my apartment was convienant but at the same time was a sort of a ball and chain, as it made it very difficult to move. Co-ordinating the telco move and the ISP move with minimal downtime, well let’s just say with DSL that’s about impossible. I managed to mitigate one move in 2001 by temporarily locating my servers at my “normal” company’s network for a few weeks while things got moved.

A few years ago I was also hit with what was a 27 hour power outage(despite being located in a down town metropolitan area, everyone got hit by that storm). Shortly after that I decided longer term a co-location is the best fit for me. So phase one was to virtualize the pair of systems in VMware. I grabbed an older server I had laying around and did that, and ran it for a year, worked great(though the server was really loud).

Then I got another email saying my ISP was bought out yet again, this time the company was going to force me to change my IP addresses, which when your hosting your own DNS can be problematic. So that was the last straw. I found a nice local company to host my server at a reasonable price. The facility wasn’t world class by any stretch, but the world class facilities in the area had little interest in someone wanting to host a single 1U box that averages less than 128kbps of traffic at any given time. But it would do for now.

I run my services on a circa 2004 Dual Xeon system, with 6GB memory, ~160GB of disk on a 3Ware 8006-2 RAID controller(RAID 1). I absolutely didn’t want to go to one of those cheap crap hosting providers where they have massive downtime and no SLAs. I also had absolutely no faith in the earlier generation “UML” “VMs(yes I know Xen and UML aren’t the same but I trust them the same amount – e.g. none). My data and privacy are fairly important to me and I am willing to pay extra to try to maintain it.

So early last year my RAID card told me one of my disks was about to fail and to replace it, so I did, rebuilt the array and off I went again. A few months later the RAID card again told me another disk was about to fail(there are only two disks in this system), so I replaced that disk, rebuilt, and off I went. Then a few months later, the RAID card again said a disk is not behaving right and I should replace it. Three disk replacements in less than a year. Though really it’s been two, I’ve ignored the most recent failing drive for several months now. Media scans return no errors, however RAID integrity checks always fail causing a RAID rebuild(this happens once a week). Support says the disk is suffering from timeouts.  There is no back plane on the system(and thus no hot swap, making disk replacements difficult). Basically I’m getting tired of maintaining hardware.

I looked at the cost of a good quality server with hot swap, remote management, etc, and something that can run ESX, cost is $3-5k. I could go $2-3k and stick to VMware server on top of Debian, a local server manufacturer has their headquarters literally less than a mile from my co-location, so it is tempting to stick with doing it on my own, and if my needs were greater than I would fo sure, cloud does not make sense in most cases in my opinion but in this case it can.

If I try to price out a cloud option that would match that $3-5k server, purely from a CPU/memory perspective the cloud option would be significantly more. But I looked closer and I really don’t need that much capacity for my stuff. My current VMware host runs at ~5-8% cpu usage on average on six year old hardware. I have 6GB of ram but I’m only using 2-3GB at best. Storage is the biggest headache for me right now hosting my own stuff.

So I looked to Terremark who seem to have a decent operation going, for the most part they know what they are doing(still make questionable decisions though I think most of those are not made by the technical teams). I looked to Terremark for a few reasons:

  • Enterprise storage either from 3PAR or EMC (storage is most important for me right now given my current situation)
  • Redundant networking
  • Tier IV facilities (my current facility lacks true redundant power and they did have a power outage last year)
  • Persistent, fiber attached storage, no local storage, no cheap iSCSI, no NFS,  no crap RAID controllers, no need to worry about using APIs and other special tools to access storage it is as if it was local
  • Fairly nice user interface that allows me to self provision VMs, IPs etc

Other things they offer that I don’t care about(for this situation, others they could come in real handy):

  • Built in load balancing via Citrix Netscalers
  • Built in firewalls via Cisco ASAs

So for me, a meager configuration of 1 vCPU, 1.5GB of memory, and 40GB of disk space with a single external static IP is a reasonable cost(pricing is available here):

  • CPU/Memory: $65/mo [+$1,091/mo if I opted for 8-cores and 16GB/ram]
  • Disk space: $10/mo [+$30/mo if I wanted 160GB of disk space]
  • 1 IP address: $7.20/mo
  • 100GB data transfer: $17/mo (bandwidth is cheap at these levels so just picked a round number)
  • Total: $99/mo

Which comes to about the same as what I’m paying for in co-location fees now, if that’s all the costs were I’d sign up in a second, but unfortunately their model has a significant premium on “IP Services”, when ideally what I’d like is just a flat layer 3 connection to the internet. The charge is $7.20/mo for each TCP and UDP port you need opened to your system, so for me:

  • HTTP – $7.20/mo
  • HTTPS – $7.20/mo
  • SMTP – $7.20/mo
  • DNS/TCP – $7.20/mo
  • DNS/UDP – $7.20/mo
  • VPN/UDP – $7.20/mo
  • SSH – $7.20/mo
  • Total: $50/mo

And I’m being conservative here, I could be opening up:

  • POP3
  • POP3 – SSL
  • IMAP4
  • IMAP4 – SSL
  • Identd
  • Total: another $36/mo

But I’m not, for now I’m not. Then you can double all of that for my 2nd system, so assuming I do go forward with deploying the second system my total costs (including those extra ports) is roughly $353/mo (I took out counting a second 100GB/mo of bandwidth). Extrapolate that out three years:

  • First year: $4,236 ($353/mo)
  • First two years: $8,472
  • First three years: $12,708

Compared to doing it on my own:

  • First year: ~$6,200 (with new $5,000 server)
  • First two years: ~$7,400
  • First three years: ~$8,600

And if you really want to see how this cost structure doesn’t scale, let’s take a more apples to apples comparison of CPU/memory of what I’d have in my own server and put it in the cloud:

  • First year – $15,328 [ 8 cores, 16GB ram 160GB disk ]
  • First two years – $30,657
  • First three years – $45,886

As you can see the model falls apart really fast.

So clearly it doesn’t make a lot of sense to do all of that at once, so if I collapse it to only the essential services on the cloud side:

  • First year: $3,420 ($270/mo)
  • First two years: $6,484
  • First three years: $9,727

I could live with that over three years, especially if the system is reliable, and maintains my data integrity. But if they added just one feature for lil ol me, that feature would be a “Forwarding VIP” on their load balancers and say basically just forward everything from this IP to this internal IP. I know their load balancers can do it, it’s just a matter of exposing the functionality. This would dramatically impact the costs:

  • First year: $2,517 ($210/mo)
  • First two years: $5,035
  • First three years: $7,552
  • First four years: $10,070

You can see how the model doesn’t scale, I am talking about 2 vCPUs  worth of power, and 3GB of memory, compared to say at least a 8-12 core physical server and 16GB or more of memory if I did it myself. But again I have no use for that extra capacity if I did it myself so it’d just sit idle, like it does today.

CPU usage is higher than I mentioned above I believe because of a bug in VMware Server 2.0 that causes CPU to “leak” somehow, which results in a steady, linear increase in cpu usage over time. I reported it to the forums, but didn’t get a reply, and don’t care enough to try to engage VMware support, they didn’t help me much with ESX and a support contract, they would do even less for VMware server and no support contract.

I signed up for Terremark’s vCloud Express program a couple of months ago, installed a fresh Debian 5.0 VM, and synchronized my data over to it from one of my existing co-located VMs.

So today I have officially transferred all of my services(except DNS) from one of my two co-located VMs to Terremark, and will run it for a while and see how the costs are, how it performs, reliability etc. My co-location contract is up for renewal in September so I have plenty of time to determine whether or not I want to make the jump, I’m hoping I can make it work, as it will be nice to not have to worry about hardware anymore. An excerpt from that link:

[..] My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’ll also think long and hard, and probably consolidate both of my co-located VMs into a single VM at Terremark if I do go that route, which will save me a lot, I really prefer two VMs, but I don’t think I should be charged double for two, especially when two are going to use roughly the same amount of resources as one. They talk all about “pay for what you use”, when that is not correct, the only portion of their service that is pay for what you use is bandwidth. Everything else is “pay as you provision”. So if you provision 100GB and a 4CPU VM but you never turn it on, well your still going to pay for it.

The model needs significant work, hopefully it will improve in the future, all of these cloud companies are trying to figure out this stuff still. I know some people at Terremark and will pass this along to them to see what they think. Terremark is not alone in this model, I’m not picking on them for any reason other than I use their services. I think in some situations it can make sense. But the use cases are pretty low at this point. You probably know that I wouldn’t sign up and commit to such a service unless I thought it could provide some good value!

Part of the issue may very well be limitations in the hypervisor itself with regards to reporting actual usage, as VMware and others improve their instrumentation of their systems that could improve the cost model for customers signficantly, perhaps doing things like charging based on CPU usage based on a 95% model like we measure bandwidth. And being able to do things like cost capping, where if your resource usage is higher for an extended period the provider can automatically throttle your system(s) to keep your bill lower(at your request of course).

Another idea would be more accurate physical to virtual mapping, where I can provision say 1 physical CPU, and X amount of memory and then provision unlimited VMs inside that one CPU core and memory. Maybe I just need 1:1, or maybe my resource usage is low enough that I can get 5:1 or 10:1, afterall one of the biggest benefits of virtualization is being able to better isolate workloads. Terremark already does this to some degree on their enterprise products, but this model isn’t available for vCloud Express, at least not yet.

You know what surprised me most next to the charges for IP services, was how cheap enterprise storage is for these cloud companies. I mean $10/mo for 40GB of space on a high end storage array? I can go out and buy a pretty nice server to host VMs at a facility of my choosing, but if I want a nice storage array to back it up I’m looking at easily 10s of thousands of dollars. I just would of expected storage to be a bigger piece of the pie when it came to overall costs. When in my case it can be as low as 3-5% of the total cost over a 3 year period.

And despite Terremark listing Intel as a partner, my VM happens to be running on -you guessed it – AMD:

yehat:/var/log# cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 4
model name    : Quad-Core AMD Opteron(tm) Processor 8389
stepping    : 2
cpu MHz        : 2913.037

AMD get’s no respect I tell ya, no respect! 🙂

I really want this to work out.

April 2, 2010

Grid Iron decloaks

Filed under: News,Storage — Tags: , , — Nate @ 10:30 am

Grid Iron Systems seems to have left stealth mode somewhat recently, they are another start up that makes an accelerator appliance that sits in between your storage and your server(s). Kind of what Avere does on the NAS side, Grid Iron does on the SAN side with their “TurboCharger“.

Certainly looks like an interesting product but it appears they make it “safe” by making it cache only reads, I want a SSD system that can cache writes too! (yes I know that wears the SSDs out faster I’m sure, but just do warranty replacement). I look forward to seeing some SPC-1 numbers on how Grid Iron can accelerate systems, at the same time I look forward to SPC-1 numbers on how automatic storage tiering can accelerate systems as well.

I’d also be interested in seeing how Grid Iron can accelerate NetApp systems vs using NetApp’s own read-only PAM (since Grid Iron specifically mentions NetApp in their NAS accelerator, although yes I’m sure they just used NetApp as an example).

April 1, 2010

New IBM blades based on Intel 7500 announced

Filed under: News,Virtualization — Tags: , , , , , — Nate @ 7:46 pm

The Register had the scoop a while back, but apparently today they were officially announced. IBM did some trickery with the new 7500 series Intel Xeons to accomplish two things:

  • Expand the amount of memory available to the system
  • Be able to “connect” two dual socket blades to form a single quad socket system

Pretty creative, though the end result wasn’t quite as impressive as it sounded up front. Their standard blade chassis is 9U and has 14 slots on it.

  • Each blade is dual socket, maximum 16 cores, and 16 DIMMs
  • Each memory extender offers 24 additional DIMMs

So for the chassis as a whole your talking about 7 dual socket systems with 40 DIMMs each. Or 3 quad socket systems with 80 DIMMs each, and 1 dual socket with 40.

Compared to an Opteron 6100 system, which you can get 8 quad socket systems with 48 DIMMs each in a single enclosure(granted such a system has not been announced yet but I am confident it will be).

  • Intel 7500-based system: 112 CPU cures (1.8Ghz), 280 DIMM slots – 9U
  • Opteron 6100-based system: 384 CPU cores (2.2Ghz), 384 DIMM slots – 10U

And the price of the IBM system is even less impressive –

In a base configuration with a single four-core 1.86 GHz E7520 processor and 8 GB of memory, the BladeCenter HX5 blade costs $4,629. With two of the six-core 2 GHz E7540 processors and 64 GB of memory, the HX5 costs $15,095.

They don’t seem to show pricing for the 8 core 7500-based blade, and say there is no pricing or ETA on the arrival of the memory extenders.

They do say this which is interesting (not surprising) –

The HX5 blade cannot support the top-end eight-core Xeon 7500 parts, which have a 130 watt thermal design point, but it has been certified to support the eight-core L7555, which runs at 1.86 GHz, has 24 MB of L3 cache, and is rated at 95 watts.

I only hope AMD has enough manufacturing capacity to keep up with demand, Opteron 6100s will wipe the floor with the Intel chips on price/performance (for the first time in a while).

March 30, 2010

Tough choice

Filed under: News — Nate @ 3:16 pm

I thought the Intel Xeon 7500 (Nehalem-EX) processors were supposed to launch today, I was confused earlier when I didn’t see any news on it, well finally The Register posted some news.

I will keep it short and sweet this time:

  • Intel Nehalem-EX L7555 1.86Ghz 95W 8-core processor price is $3,157 (in 1,000-unit quantities)
  • AMD Opteron 6174 2.2Ghz 80W 12-core processor price is $1,165 (in 1,000 unit quantities)

Touch choice indeed.

If you want more than four sockets of course you will likely not be able to use an off the shelf Opteron 6100 system(may need to build your own chipset), but the justification behind this decision by AMD seems very reasonable:

That 100 GB/sec of memory bandwidth in the forthcoming four-socket Opteron 6100 machine is one of the reasons why AMD decided not to do eight-socket configurations with these chips. “It is a lot of development work for a not very large market,” says Fruehe, who reckons that there are only about 1,800 x64-based eight-socket servers sold worldwide each quarter and that the number is dwindling as four-socket boxes get more powerful.

“Intel is raving about having 15 different designs for its upcoming Nehalem EX machines, but how many is each vendor going to get out of those 1,800 units?” Fruehe says that the 8P boxes account for less than two-tenths of a percent of current shipments each quarter, and that while 4P boxes are only accounting for around 4 per cent, “even though it is a small space, AMD needs to be there.”

March 29, 2010

Opteron 6100s are here

Filed under: News,Virtualization — Tags: , , — Nate @ 8:28 am

UPDATED I’ve been waiting for this for quite some time, finally the 12-core AMD Opteron 6100s have arrived. AMD did the right thing this time by not waiting to develop a “true” 12-core chip and instead bolted a pair of CPUs together into a single package. You may recall AMD lambasted Intel when it released it’s first four core CPUs a few years ago(composed of a pair of two-core chips bolted together), a strategy that paid off well for them, AMD’s market share was hurt badly as a result, a painful lesson which they learned from.

For me I’d of course rather have a “true” 12-core processor, but I’m very happy to make do with these Opteron 6100s in the meantime, I don’t want to have to wait another 2-3 years to get 12 cores in a socket.

Some highlights of the processor:

  • Clock speeds ranging from 1.7Ghz(65W) to 2.2Ghz(80W), with a turbo boost 2.3Ghz model coming in at 105W
  • Prices ranging from $744 to $1,396 in 1,000-unit quantities
  • Twelve-core and Eight–core, L2 – 512K/core, L3 – 12MB of shared L3 Cache
  • Quad-Channel LV & U/RDDR3, ECC, support for on-line spare memory
  • Supports up to 3 DIMMs/channel, up to 12 DIMMS per CPU
  • Quad 16-bit HyperTransportâ„¢ 3 technology (HT3) links, up to 6.4 GT/s per link (more than triple HT1 performance)
  • AMD SR56x0 chipset with I/O Virtualization and PCIe® 2.0
  • Socket compatibility with planned AMD Opteronâ„¢ 6200 Series processor.(16 cores?)
  • New advanced idle states allowing the processor to idle with less power usage than the previous six core systems (AMD seems to have long had the lead in idle power conservation).


The new I/O virtualization looks quite nice as well – AMD-V 2.0, from their site:

Hardware features that enhance virtualization:

  • Unmatched Memory Bandwidth and Scalability – Direct Connect Architecture 2.0 supports a larger number of cores and memory channels so you can configure robust virtual machines, allowing your virtual servers to run as close as possible to physical servers.
  • Greater I/O virtualization efficiencies –I/O virtualization to help increase I/O efficiency by supporting direct device assignment, while improving address translation to help improve the levels of hypervisor intervention.
  • Improved virtual machine integrity and security –With better isolation of virtual machines through I/O virtualization, helps increase the integrity and security of each VM instance.
  • Efficient Power Management – AMD-P technology is a suite of power management features that are designed to drive lower power consumption without compromising performance. For more information on AMD-P, click here
  • Hardware-assisted Virtualization – AMD-V technology to enhance and accelerate software-based virtualization so you can run more virtual machines, support more users and transactions per virtual machine with less overhead. This includes Rapid Virtualization Indexing (RVI) to help accelerate the performance of many virtualized applications by enabling hardware-based VM memory management. AMD-V technology is supported by leading providers of hypervisor and virtualization software, including Citrix, Microsoft, Red Hat, and VMware.
  • Extended Migration – a hardware feature that helps virtualization software enable live migration of virtual machines between all available AMD Opteronâ„¢ processor generations. For a closer look at Extended Migration, follow this link.

With AMD returning to the chipset design business I’m happy with that as well, I was never comfortable with Nvidia as a server chipset maker.

The Register has a pair of great articles on the launch as well, though the main one I was kind of annoyed I had to scroll so much to get past the Xeon news, which I don’t think they had to go out of their way to recap with such detail in an article about the Opterons, but oh well.

I thought this was an interesting note on the recent Intel announcement of integrated silicon for encryption –

While Intel was talking up the fact that it had embedded cryptographic instructions in the new Xeon 5600s to implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data, Opterons have had this feature since the quad-core “Barcelona” Opterons came out in late 2007, er, early 2008.

And as for performance –

Generally speaking, bin for bin, the twelve-core Magny-Cours chips provide about 88 per cent more integer performance and 119 per cent more floating point performance than the six-core “Istanbul” Opteron 2400 and 8400 chips they replace..

AMD seems geared towards reducing costs and prices as well with –

The Opteron 6100s will compete with the high-end of the Xeon 5600s in the 2P space and also take the fight on up to the 4P space. But, AMD’s chipsets and the chips themselves are really all the same. It is really a game of packaging some components in the stack up in different ways to target different markets.

Sounds like a great way to keep costs down by limiting the amount of development required to support the various configurations.

AMD themselves also blogged on the topic with some interesting tidbits of information –

You’re probably wondering why we wouldn’t put our highest speed processor up in this comparison. It’s because we realize that while performance is important, it is not the most important factor in server decisions.  In most cases, we believe price and power consumption play a far larger role.

[..]

Power consumption – Note that to get to the performance levels that our competitor has, they had to utilize a 130W processor that is not targeted at the mainstream server market, but is more likely to be used in workstations. Intel isn’t forthcoming on their power numbers so we don’t really have a good measurement of their maximum power, but their 130W TDP part is being beaten in performance by our 80W ACP part.  It feels like the power efficiency is clearly in our court.  The fact that we have doubled cores and stayed in the same power/thermal range compared to our previous generation is a testament to our power efficiency.

Price – This is an area that I don’t understand.  Coming out of one of the worst economic times in recent history, why Intel pushed up the top Xeon X series price from $1386 to $1663 is beyond me.  Customers are looking for more, not less for their IT dollar.  In the comparison above, while they still can’t match our performance, they really fall short in pricing.  At $1663 versus our $1165, their customers are paying 42% more money for the luxury of purchasing a slower processor. This makes no sense.  Shouldn’t we all be offering customers more for their money, not less?

In addition to our aggressive 2P pricing, we have also stripped away the “4P tax.” No longer do customers have to pay a premium to buy a processor capable of scaling up to 4 CPUs in a single platform.  As of today, the 4P tax is effectively $0. Well, of course, that depends on you making the right processor choice, as I am fairly sure that our competitor will still want to charge you a premium for that feature.  I recommend you don’t pay it.

As a matter of fact, a customer will probably find that a 4P server, with 32 total cores (4 x 8-core) based on our new pricing, will not only perform better than our competitor’s highest end 2P system, but it will also do it for a lower price. Suddenly, it is 4P for the masses!

While for the most part I am mainly interested in their 12-core chips, but I also see significant value in the 8 core chips, being able to replace a pair of 4 core chips with a single socket 8 core system is very appealing as well in certain situations. There is a decent premium on motherboards that need to support more than one socket. Being able to get 8, (and maybe even 12 cores) on a single socket system is just outstanding.

I also found this interesting –

Each one is capable of 105.6 Gigaflops (12 cores x 4 32-bit FPU instructions x 2.2GHz).  And that score is for the 2.2GHz model, which isn’t even the fastest one!

I still have a poster up on one of my walls back from 1995-1996 era on the world’s first Teraflop machine, which was –

The one-teraflops demonstration was achieved using 7,264 Pentium Pro processors in 57 cabinets.

With the same number of these new Opterons you could get 3/4ths of the way to a Petaflop.

SGI is raising the bar as well –

This means as many as 2,208 cores in a single rack of our Rackable™ rackmount servers. And in the SGI ICE Cube modular data center, our containerized data center environment, you can now scale within a single container to 41,760 cores! Of course, density is only part of the picture. There’s as much to be excited about when it comes to power efficiency and the memory performance of SGI servers using AMD Opteron 6100 Series processor technology

Other systems announced today include:

  • HP DL165G7
  • HP SL165z G7
  • HP DL385 G7
  • Cray XT6 supercomputer
  • There is mention of a Dell R815 though it doesn’t seem to be officially announced yet. The R815 specs seem kind of underwhelming in the memory department, with it only supporting 32 DIMMs (the HP systems above support the full 12 DIMMs/socket). It is only 2U however. Sun has had 2U quad socket Opteron systems with 32 DIMMs for a couple years now in the form of the X4440, strange that Dell did not step up to max out the system with 48 DIMMs.

I can’t put into words how happy and proud I am of AMD for this new product launch, not only is it an amazing technological achievement, but the fact that they managed to pull it off on schedule is just amazing.

Congratulations AMD!!!

March 28, 2010

Vulnerable Smart Grid

Filed under: News,Security — Nate @ 9:27 am

As some of you who know me may know, I have been against the whole concept of a “smart grid” for a few years now. The main reason behind this is security. The more intelligence you put into something especially with regards to computer technology the more complex it becomes, the more complex it becomes the harder it is to protect.

Well it seems the main stream media has picked up on this with an article from the AP

SAN FRANCISCO – Computer-security researchers say new “smart” meters that are designed to help deliver electricity more efficiently also have flaws that could let hackers tamper with the power grid in previously impossible ways.

Kind of reminds me of the RFID-based identification schemes that have been coming online in the past few years, just as prone to security issues. In the case of the smart grid, my understanding of it is that the goal is to improve energy efficiency by allowing the power company to intelligently inform downtream customers of power conditions so that things like heavy appliances can be proactively turned off in the event of a surge in usage to prevent brown and blackouts.

Sounds nice in theory, like many things, but as someone who has worked with technology for about 20 years now I see the quality of stuff that comes out of companies, and I just have no confidence that such technonlogy can be made “secure” at the same time it can be made “cost effective”. At least not at our current level of technological sophistication, I mean from an evolutionary standpoint “technology” is still a baby, we’re still figuring stuff out, it’s brand new stuff. I don’t mean to knock any company or organization in particular, they are not directly at fault, I just don’t believe – in general technology is ready for such a role, not in a society such as ours.

Today in many cases you can’t get a proper education in modern technology because the industries are moving too fast for the schools to keep up. Don’t get me started on organizations like OLPC and others trying to pitch laptop computers to schools in an attempt to make education better.

If you want to be green, in my opinion, get rid of the coal fired power plants. I mean 21st century and we still have coal has generating roughly half(or more) of our electricity ? Hasn’t anyone played Sim City?

Of course this concept doesn’t just apply to the smart grid, it applies to everything as our civilization tries to put technology to work to improve our lives. Whether it’s wifi, rfid, or online banking, all of these(and many others) expose us to significant security threats, when not deployed properly, and in my experience, from what I have seen, the numbers of implimentations that are not secure outnumber the ones that are by probably 1000:1. So we have a real significant trend of this in action(technology being deployed then being actively exploited). I’m sure you agree that our power grid is a fairly important resource, it was declared the most important engineering achievement of the 20th century.

While I don’t believe it is possible yet, we are moving down the road where scenes like those portrayed in the movie Eagle Eye (saw it recently had it on my mind), will be achievable, especially now that many nations have spun up formal hacker teams to fight future cyber wars, and you have to admit, we are a pretty tempting target.

There will be a very real cost to this continued penetration of technology into our lives. In the end I think the cost will be too high, but time will tell I guess.

You could say I long for the earlier days of technology where for the most part security “threats” were just people that wanted to poke around in systems, or compromise a host to “share” it’s bandwidth and disk space to host pirated software. Rarely was there any real malice behind any of it, not true anymore.

And for those that are wondering – the answer is no. I have never, ever had a wireless access point hooked to my home  network, and I do my online banking from Linux.

March 26, 2010

Enterprise EqualLogic

Filed under: Storage — Tags: , , — Nate @ 6:33 am

So, I attended that Dell/Denali event I mentioned recently. They covered some interesting internals on the architecture of Exchange 2010. Covering technical topics like migrating to it, how it protects data, etc. It was interesting from that standpoint, they didn’t just come out and say “Hey we are big market leader you will use us, resistance is futile”. So I certainly appreciated that although honestly I don’t really deal with MS stuff in my line of work, I was just there for the food and mainly because it was walking distance(and an excuse to get out of the office).

The other topic that was heavily covered was on Dell EqualLogic storage. This I was more interested in. I have known about EqualLogic for years, and never really liked their iSCSI-only approach(I like iSCSI but I don’t like single protocol arrays, and iSCSI is especially limiting as far as extending array functionality with other appliances, e.g. you can optionally extend a Fiber channel only array with iSCSI but not vise versa – please correct me if I’m wrong.)

I came across another blog entry last year which I found extremely informative – “Three Years of EqualLogic” which listed some great pros and some serious and legimate cons to the system after nearly three years of using it.

Anyways, being brutally honest if there is anything I did really “take away” from the conference with regards to EqualLogic storage it is this – I’m glad I chose 3PAR for my storage needs(and thanks to my original 3PAR sales rep for making the cold call many years ago to me. I knew him from an earlier company..).

So where to begin, I’ve had a night to sleep on this information and absorb it in a more logical way, I’ll start out with what I think are the pros to the EqualLogic platform:

  • Low cost – I haven’t priced it personally but people say over and over it’s low cost, which is important
  • Easy to use – It certainly looks very easy to use, very easy to setup, I’m sure they could get 20TB of EqualLogic storage up running in less time than 3PAR could do it no doubt.
  • Virtualized storage makes it flexible. It pales in comparison to 3PAR virtualization but it’s much better than legacy storage in any case.
  • All software is included – this is great too, no wild cards with licensing. 3PAR by contrast heavily licenses their software and at times it can get complicated in some situations(their decision to license the zero detection abilities of their new F/T class arrays was a surprise to me)

So it certainly looks fine for low(ish) cost workgroup storage, one of the things the Dell presenter tried to hammer on is how it is “Enterprise ready”. And yes I agree it is ready, lots of enterprises use workgroup storage I’m sure for some situations(probably because their real legacy enterprise storage is too expensive to add more applications to, or doesn’t scale to meet mixed workloads simultaneously).

Here’s where I get down & dirty.

As far as really ready for enterprise storage – no way it’s not ready, not in 2010, maybe if it was 1999.

EqualLogic has several critical architectural deficiencies that would prevent me from wanting to use it or advising others to use it:

  • Active/passive controller design – I mean come on, in 2010 your still doing active/passive? They tried to argue the point where you don’t need to “worry” about balancing the load between controllers and then losing that performance when a controller fails. Thanks, but I’ll take the extra performance from the other active controller(s)[with automagic load balancing, no worrying required], and keep performance high with 3PAR Persistant Cache in the event of a controller failure(or software/hardware upgrade/change).
  • Need to reserve space for volumes/snapshots. Hello, 21st century here, we have the technology for reservationless systems, ditching reservations is especially critical when dealing with thin provisioning.
  • Lack of storage pools. This compounds the effects of a reservation-based storage system. Maybe EqualLogic has storage pools, I just did not hear it mentioned in the conference nor anywhere else. Having to reserve space for each and every volume is just stupidly inefficient. At the very least you should be able to reserve a common pool of space and point multiple volumes to it to share. Again hints to their lack of a completely virtualized design. You get a sense that a lot of these concepts were bolted on after the fact and not designed into the system when you run into system limitations like this.
  • No global hot spares – so the more shelves you have the more spindles are sitting there idle, doing nothing. 3PAR by contrast does not use dedicated spares, each and every disk in the system has spare capacity on it. When a RAID failure occurs the rebuild is many:many instead of many:one. This improves rebuild times by 10x+. Also due to this design, 3PAR can take advantage of the I/O available on every disk on the array. There aren’t even dedicated parity disks, parity is distributed evenly across all drives on the system.
  • Narrow striping. They were talking about how the system distributes volumes over all of the disks in the system. So I asked them how far can you stripe say a 2TB volume? They said over all of the shelves if you wanted to, but there is overhead from iSCSI because apparently you need an iSCSI session to each system that is hosting data for the volume, due to this overhead they don’t see people “wide striping” of a single volume over more than a few shelves. 3PAR by contrast by default stripes across every drive in the system, and the volume is accessible from any controller(up to 8 in their high end) transparently. Data moves over an extrenely high speed backplane to the controller that is responsible for those blocks. In fact the system is so distributed that it is impossible to know where your data actually is(e.g. data resides on controller 1 so I’ll send my request to controller 1), and the system is so fast that you don’t need to worry about such things anyways.
  • Cannot easily sustain the failure of a whole shelf of storage. I asked the Dell rep sitting next to me if it was possible, he said it was but you had to have a special sort of setup, it didn’t sound like it was going to be something transparent to the host, perhaps involving synchrnous replication from one array to another, in the event of failure you probably had to re-point your systems to the backup, I don’t know but my point is I have been spoiled by 3PAR in that by default their system uses what they call cage level availability, which means data is automatically spread out over the system to ensure a failure of a shelf does not impact system availability. This requires no planning in advance vs other storage systems, it is automatic. You can turn it off if you want as there are limitations as far as what RAID levels you can use depending no the number of shelves you have (e.g. you cannot run RAID 5 with cage level availability with only 2 shelves because you need at least 3), the system will prevent you from making mistakes.
  • One RAID level per array(enclosure) from what the Dell rep sitting next to me said. Apparently even on their high end 48-drive arrays you can only run a single level of RAID on all of the disks? Seems very limiting for a array that has such grand virtualization claims. 3PAR of course doesn’t limit you in this manor, you can run multiple RAID levels on the same enclosure, you can even run multiple RAID levels on the same DISK, it is that virtualized.
  • Inefficient scale out – while scale out is probably linear, the overhead involved with so many iSCSI sessions with so many arrays has to have some penalty. Ideally what I’d like to see is at least some sort of optional Infiniband connectivity between the controllers to give them higher bandwidth, lower latency, and then do like 3PAR does – traffic can come in on any port, and routed to the appropriate active controller automatically. But their tiny controllers probably don’t have the horsepower to do that anyways.

There might be more but those are the top offenders at the top of my list. One part of the presentation which I didn’t think was very good was when the presenter streamed a video from the array and tested various failure scenarios.  The amount of performance capacity needed to transfer a video under failure conditions of a storage array is a very weak illustration on how seamless a failure can be. Pulling a hard disk out, or a disk controller or a power supply, really is trivial. To the uninformed I suppose it shows the desired effect(or lack of) though which is why it’s done. A better test I think would be running something like IO Zone on the array and showing the real time monitoring of IOPS and latency when doing failure testing(preferably with at least 45-50% of the system loaded).

You never know what you’re missing until you don’t have it anymore. You can become complacent in what you have as being “good enough” because you don’t know any better. I remember feeling this especially strongly when I changed jobs a few years ago, and I went from managing systems in a good tier 4 facility to another “tier 4” facility which had significant power issues(at least one major outage a year seemed like). I took power for granted at the first facility because we had gone so many years without so much as a hiccup. It’s times like this I realize (again) the value that 3PAR storage brings to the market and am very thankful that I can take advantage of it.

What I’d like to see though is some SPC-1 numbers posted for a rack of EqualLogic arrays. They say it is enterprise ready, and they talk about the clouds surrounding iSCSI. Well put your money where your mouth is and show the world what you can do with SPC-1.

March 17, 2010

Frightened

Filed under: General,Networking — Tags: — Nate @ 8:15 pm

Frightened. That was the word that first came to my mind when I read this article from our friends at The Register.

The report also says that 60 per cent of Google’s traffic is now delivered directly to consumer networks. In addition to building out a network of roughly 36 data centers and co-locating in more than 60 public exchanges, the company has spent the past year deploying its Google Global Cache (GGC) servers inside consumer networks across the globe. Labovitz says that according to Arbor’s anecdotal conversations, more than half of all consumer providers in North American and Europe now have at least one rack of Google’s cache servers.

Honestly, I am speechless beyond the word frightened, you may want to refer to an earlier blog post “Lesser of two Evils” for more details.

March 16, 2010

IBM partners with Red hat for KVM cloud

Filed under: News,Virtualization — Tags: , , , — Nate @ 6:32 pm

One question: Why?

IBM has bombarded the IT world for years now how they can consolidate hundreds to thousands of Linux VMs onto a single mainframe.

IBM has recently announced a partnership with Red hat to use KVM in a cloud offering. At first I thought, well maybe they are doing it to offer Microsoft applications as well, but that doesn’t appear to be the case:

Programmers who use the IBM Cloud for test and dev will be given RHEV to play with Red Hat Enterprise Linux or Novell SUSE Linux Enterprise Server images with a Java layer as they code their apps and run them through regression and other tests.

Let’s see, Linux and Java, why not use the mainframes to do this? Why KVM? As far as the end users are concerned it really shouldn’t matter, after all it’s java and linux.

Seems like a slap in the face to their mainframe division (I never bought into the mainframe/linux/VM marketing myself, I suppose they don’t either). I do remember briefly having access to a S390 running a SuSE VM about 10 years ago, it was..interesting.

Deja Vu

Filed under: News — Nate @ 9:55 am

Intel released their 5600 CPUs today, first saw an announcement on Supermicro’s site last night(there’s a dozen or two vendors whom I prowl their sites regularly). I couldn’t help but get a sense of deja vu when it came to new Intel 6-core CPUs. It seems just yesterday^W nearly 2 years ago that they released their first hex core processor, the Xeon 7000 series.  Yeah I know clock for clock these new chips are much faster, new cores, more threads etc. But purely from a core perspective..why might anyone want to go buy one of these new 5600 series systems, with the new 8 core chips coming in a couple of weeks, and AMD’s new 8 and 12 core chips coming at about the same time?

I think Intel got screwed on this one, mostly screwed by their OEMs. That is, many (most? all?) of the large OEMs have adapted the upcoming 8 core Intel chips which were intended for 4-socket and greater systems to run in dual socket configurations, something Intel obviously didn’t think of when they were designing this new 5600 series chip. On the same note I never understood why the 7000 series of chips never made it into dual socket systems, but oh well it doesn’t matter now.

I think short of an upgrade cycle for existing 5500 series systems probably next year(since the new 5600s are socket compatible, and given the 5500s are so new I can’t imagine many customers needing to upgrade so soon), I think the 5600 is a dead product.

« Newer PostsOlder Posts »

Powered by WordPress