TechOpsGuys.com Diggin' technology every day

1Jun/154

3PAR Gen5: True flash scale storage

TechOps Guy: Nate

(I'm in Vegas, waiting for HP Discover to start(got here Friday afternoon), this time around I had my own company pay my way, so HP isn't providing me with the trip. Since I haven't been blogging much the past year I didn't feel good about asking HP to cover me as a blogger.)

UPDATED - 6/4/2015

When flash first started becoming a force to be reckoned with a few years ago in enterprise storage it was clear to me controller performance wasn't up to the task of being able to exploit the performance potential of the medium. The ratio of controllers to SSDs was just way out of whack.

In my opinion this has been largely addressed in the 3PAR Generation 5 systems that are being released today.

3PAR 20k system in the flesh

3PAR 20k system in the flesh

NEW HP 3PAR 20800

I believe this replaces the 10800/V800 (2-8 controllers) system, I believe it also replaces the 10400/V400(2-4 controllers) system as well.

  • 2-8 controllers, max 96 x 2.5Ghz CPU cores(12/controller) and 16 Generation 5 ASICs (2/controller)
  • 224GB of RAM cache per controller (1.8TB total - 768GB control / 1,024GB data)
  • Up to 32TB of Flash read cache
  • Up to 6PB of raw capacity (SSD+HDD)
  • Up to 15PB of usable capacity w/deduplication
  • 12Gb SAS back end, 16Gb FC (Max 160 ports) / 10Gb iSCSI (Max 80 ports) front end
  • 10 Gigabit ethernet replication port per controller
  • 2.5 Million Read IOPS under 1 millisecond
  • Up to 75 Gigabytes/second throughput (read), up to 30 Gigabytes/second throughput (write)

NEW HP 3PAR 20850

This augments the existing 7450 with a high end 8-controller capable all flash offering, similar to the 20800 but with more adrenaline.

  • 2-8 controllers, max 128 x 2.5Ghz CPU cores(16/controller), and 16 Generation 5 ASICs (2/controller)
  • 448GB of RAM cache per controller (3.6TB total - 1,536GB control - 2,048GB data)
  • Up to 4PB of raw capacity (SSD only)
  • Up to 10PB of usable capacity w/deduplication
  • 12Gb SAS back end, 16Gb FC (Max 160 ports) / 10Gb iSCSI (Max 80 ports) front end
  • 10 Gigabit ethernet replication port per controller
  • 3.2 Million Read IOPS under 1 millisecond
  • 75 Gigabytes/second throughput (read), 30 Gigabytes/second throughput (write)

Even though flash is really, really fast, 3PAR leverages large amounts of cache to optimize writes to the back end media, this not only improves performance but extends SSD life. If I recall correctly nearly 100% of the cache on the all SSD-systems is write cache, reads are really cheap so very little caching is done on reads.

It's only going to get faster

One of the bits of info I learned is that the claimed throughput by HP is a software limitation at this point. The hardware is capable of much more and performance will improve as the software matures to leverage the new capabilities of the Generation 5 ASIC (code named "Harrier 2").

Magazine sleds are no more

Since the launch of the early 3PAR high end units more than a decade ago 3PAR had leveraged custom drive magazines which allowed them to scale to 40x3.5" drives in a 4U enclosure. That is quite dense, though they did not have similar ultra density for 2.5" drives. I was told that nearline drives represent around 10% of disks sold on 3PAR so they decided to do away with these high density enclosures and go with 2U enclosures without drive magazines. This allows them to scale to 48x2.5" drives in 4U (vs 40 2.5" in 4U before), but only 24x3.5" drives in 4U (vs 40 before). Since probably greater than 80% of the drives they ship now are 2.5" that is probably not a big deal. But as a 3PAR historian(perhaps) I found it interesting.

Along similar lines, the new 20k series systems are fully compatible with 3rd party racks. The previous 10k series as far as I know the 4-node variant was 3rd party compatible but 8-node required a custom HP rack.

Inner guts of a 3PAR 20k-series controller

With dual internal SATA SSDs for the operating system(I assume some sort of mirroring going on), eight memory slots for data cache(max 32GB per slot) - these are directly connected to the pair of ASICs(under the black heatsinks). Another six memory slots for the control cache(operating system, meta data etc) also max 32GB per slot those are controlled by the Intel processors under the giant heat sinks.

3PAR 20k series controller

3PAR 20k series controller

How much does it cost?

I'm told the new 20k series is surprisingly cost effective in the market. The price point is significantly lower than earlier generation high end 3PAR systems. It's priced higher than the all flash 7450 for example but not significantly more, entry level pricing is said to be in the $100,000 range, and I heard a number tossed around that was lower than that. I would assume that the blanket thin provisioning licensing model of the 7000 series extends to the new 20k series, but I am not certain.

It is only available in a 8-controller capable system at this time, so requires at least 16U of space for the controllers since they are connected over the same sort of backplane as earlier models. Maybe in the future HP will release a smaller 2-4 controller capable system or they may leave that to whatever replaces the 7450. I hope they come out with a smaller model because the port scalability of the 7000-series(and F and E classes before them) is my #1 complaint on that platform, having only one PCIe expansion slot/controller is not sufficient.

NEW HP 3PAR 3.84TB cMLC SSD

HP says that this is the new Sandisk Optimus MAX 4TB drive, which is targeted at read intensive applications. If a customer decides to use this drive in non read intensive(and non 3PAR) then the capacity will drop to 3.2TB. So with 3PAR's adaptive sparing they are able to gain 600GB of capacity on the drive while simultaneously supporting any workload without sacrificing anything.

This is double the size of the 1.92TB drive that was released last year. It will be available on at least the 20k and the 7450, most likely all other Gen4 3PAR platforms as well.

HP says this drops the effective cost of flash to $1.50/GB usable.

This new SSD comes with the same 5 year unconditional warranty that other 3PAR SSDs already enjoy.

I specifically mention the 7450 here because this new SSD effectively doubles the raw capacity of the system to 920TB of raw flash today (vs 460TB before). How many all flash systems scale to nearly a petabyte of raw flash?

NEW HP 3PAR Persistent Checksum

With the Generation 4 systems 3PAR had end to end T10 data integrity checking within the array itself from the HBAs to the ASICs, to the back end ports and disks/SSDs. Today they are extending that to the host HBAs, and fibre channel switches as well (not sure if this extends to iSCSI connections or not).

The Generation 5 ASIC has a new line rate SHA1 engine which replaces the line rate CRC engine in Generation 4 for even better data protection. I am not certain if persistent checksum is Generation 5 specific(given they are extending it beyond the array I really would expect it to be possible in Generation 4 as well).

NEW HP 3PAR Asynchronous Streaming Replication

I first heard about this almost two years ago at HP Storage Tech Day, but today it's finally here. HP adds another method of replication to the existing sets they already had:

  • Synchronous replication - 0 data loss (strict latency limits)
  • Synchronous long distance replication (requires 3 arrays) - 0 data loss (latency limits between two of the three arrays)
  • Asynchronous replication - as low as 5 minutes of data loss (less strict latency limits)
  • Asynchronous streaming replication - as low as 1 second of data loss (less strict latency limits)

HP compares this to EMC's SRDF async replication which has as low as a 15 seconds of data loss, vs 3PAR with as low as 1 second.

If for some reason more data comes into the system than the replication link can handle, the 3PAR will automatically go into asynchronous replication mode until the replication is caught up then switch back to asynchronous streaming.

This new feature is available on all Gen4 and Gen5 systems.

NEW Co-ordinated Snapshots

3PAR has long had the ability to snapshot multiple volumes on a single system simultaneously, and it's always been really easy to use. Now they have extended this to be able to snapshot across multiple arrays simultaneously and make them application aware (in the case of VMware initially, Exchange, SQL Server and Oracle to follow).

This new feature is available on all Gen4 and Gen5 systems.

HP 3PAR Storage Federation

Up to 60PB of usable capacity and 10 Million IOPS with zero overhead

3PAR Federation

3PAR Federation

HP has talked about Storage Federation in the past, today with the new systems of course the capacity knobs have been turned up a lot, they've made it easier to use than earlier versions of the software, though don't yet have completely automatic load balancing between arrays yet.

This federation is possible between all Gen4 and Gen5 systems.

Benefits from ASIC Acceleration

3PAR has always use in house custom ASICs on their systems and these are no different

The ASICs within each HP 3PAR StoreServ 20850 and 20800 Storage controller node serve as the high-performance engines that move data between three I/O buses, a four memory-bank data cache, and seven high-speed links to the other controller nodes over the full-mesh backplane. These ASICs perform RAID parity calculations on the data cache and inline zero-detection to support the system’s data compaction technologies. CRC Logical Block Guard used by T10-DIF is automatically calculated by the HBAs to validate data stored on drives with no additional CPU overhead. An HP 3PAR StoreServ 20800 Storage system with eight controller nodes has 16 ASICs totaling 224 GB/s of peak interconnect bandwidth.

3PAR 208x0 Controller Architecture

3PAR 208x0 Controller Architecture

3par-mixedworkload

NEW Online data import from HDS arrays

You are now able to do online import of data volumes from Hitachi arrays in addition to the EMC VMAX, CX4, VNX, and HP EVA systems.

The competition

HP touts the scalability of usable and raw flash capacity of these new systems + the new 3.84TB SSD against their competition:

  • Consolidate thirty Pure Storage //m70 storage systems onto a single 3PAR 20850 (with 87% less power/cooling/space) ***
  • Consolidate eight XtremeIO storage systems onto a single 3PAR 20850 (with 62% less power/cooling/space)
  • Consolidate three EMC VMAX 400K storage systems onto a single 3PAR 20850 (with 85% less power/cooling/space)

HP also touts their throughput numbers (75GB/second) are between two and ten times faster than the competition. The 7450 came in at only 5.5GB/second, so this is quite a step up.

*** HP revised their presentation last minute their original claims were against the Pure 450, which was replaced by the m70 on the same day of the 3PAR announcement. The numbers here are from memory from a couple of days ago they may not be completely accurate.

Fastest growing in the market

HP touted again 3PAR was the fastest growing all flash in the market last year. They also said they have sold more than 1,000 all flash systems in the first half which is more than Pure Storage sold in all of last year. In other talks with 3PAR folks specifically on market share they say they are #1 in midrange in Europe and #2 in Americas, with solid growth across the board consistently for many quarters now. 3PAR is still #5 in the all flash market, part of that is likely due to compression(see below), but I have no doubt this new generation of systems will have a big impact on the market.

Still to come

Compression remains a road map item, they are working on it, but obviously not ready for release today. Also this marks probably the first 3PAR hardware released in more than a decade that wasn't accompanied by SPC-1 results. HP says SPC-1 is coming, and it's likely they will do their first SPC-2 (throughput) test on the new systems as well.

Bottom line

HP continues to show that it's 3PAR architecture is fully capable of embracing the all flash era and has a long life left in it. Not only are you getting the maturity of the enterprise proven 3PAR systems (over a decade at this point), but you are not having to compromise on almost anything else related to all flash(compression being the last holdout).

Tagged as: , 4 Comments
4Mar/15Off

Sign off ?

TechOps Guy: Nate

So I apologize (again) for not posting much, and not replying to comments recently.

I suppose it's obvious I haven't posted in a long time. I have mentioned this many times before but there really isn't much in tech that has gotten me excited in probably the past two years. I see new things and am just not interested anymore for whatever reason.

I have been spending some time with the 3PAR 7450 that I got late last year that is a pretty cool box but at the end of the day it's the same 3PAR I've known for the past 8 years just with SSDs and dedupe (which is what I wanted, I needed something I felt I could rely on for the business I work for, I have become very conservative when it comes to storage over the years).

That and there's been a lot of cool stuff going on with me outside of tech so I am mostly excited about that and have been even less focused on tech recently.

I pushed myself harder than I thought possible for more than a decade striving to be the best that I could be in the industry and think I accomplished a lot (at one point last year a former boss of mine said they hired 9 people to do my job after I left that particular company. Other positions are/were similar, perhaps not as extreme.)

I am now pushing myself harder than I ever thought possible in basically everything BUT tech, in part to attempt to make up for sacrifices made over the previous decade. So I am learning new things, just not as much in technology and I don't know how long this journey will take.

I can't put into words how excited I am.

Tech interesting areas I have spent some time on in recent months that may get a blog post at some point include:

  • LogicMonitor - the most advanced/easy to use dashboarding/graphing system I've ever come across. It does more than dashboards and graphs but to-date that is all I've used it for and it pays for itself 5x over with that alone for me. I've spent a bunch of time porting my custom monitors over to it including collecting more than 12,000 data points/minute from my 3PAR systems! I can't say enough good things about this platform from the dashboard/graphing standpoint(since that is all I use it for right now)
  • ScaleArc - Sophisticated database availability tool. For me using it for MySQL though they support other DBs as well. Still in the very early stages of deployment.
  • HP StoreOnce - not sure I have much to write about this, since I only use it as a NAS, all of the logic is in my own scripts. But getting 33.6:1 reduction in data on 44TB of written user data is pretty sweet for me, beats the HELL out of the ZFS system I was using for this before(maybe 5:1 reduction with ZFS).

So, this may be the last blog post for a while(or forever) I am not sure, for anyone out there still watching, thanks for reading over the years, thanks for the comments, and wish you the best!

 

Filed under: Random Thought 4 Comments
7Nov/14Off

Two factor made easy

TechOps Guy: Nate

Sorry been really hammered recently, just spent the last two weeks in Atlanta doing a bunch of data center work(and the previous week or two planning for that trip), many nights didn't get back to the hotel until after 7AM .. But got most of it done..still have a lot more to do though from remote.

I know there has been some neat 3PAR announcements recently I plan to try to cover that soon.

In the meantime onto a new thing to me: two factor authentication. I recently went through preparations for PCI compliance and among those things we needed two factor authentication on remote access. I had never set up nor used two factor before. I am aware of the common approach of using a keyfob or mobile app or something to generate random codes etc. Seemed kind of, I don't know, not user friendly.

In advance of this I was reading a random thread on slashdot something related to two factor, and someone pointed out the company Duo Security as one option. The PCI consultants I was working with had not used it and had proposed another (self hosted) option which involved integrating our OpenLDAP with it, along with radius and mysql and a mobile app or keyfob with codes and well it just all seemed really complicated(compounded by the fact that we needed to get something deployed in about a week). I especially did not like the having to type in a code bit. I mean it wasn't too much before that I got a support request from a non technical user trying to login to our VPN - she would login and the website would prompt her to download & install the software. She would download the software (but not install it) and think it wasn't working - then try again (download and not install). I wanted something simpler.

So enter Duo Security, a SaaS platform for two factor authentication that integrates with quite a bit of back end things including lots of SSL and IPSec VPNs (and pretty much anything that speaks Radius which seems to be standard with two factor).

They tie it all up into a mobile app that runs on several different major mobile platforms both phone and tablet. The kicker for them is there are no codes. I haven't seen any other two factor systems personally that are like this (have only observed maybe a dozen or so, by no means am I an expert at this). The ease of use comes in two forms:

Inline self enrollment for our SSL VPN

Initial setup is very simple, once the user types their username and password to login to the SSL VPN (which is browser based of course), an iframe kicks in (how this magic works I do not know) and they are taken through a wizard that starts off looking something like this

duo-choose-device

No separate app, no out of band registration process.

By comparison (what prompted me to write this now) is I just went through a two factor registration process for another company (which requires it now) who uses something called Symantec Validation & ID Protection which is also a mobile app. Someone had to call me on the phone, I told them my Credential ID, and a security code, then I had to wait for the 2nd security code and told them that, and that registered my device with whatever they use.  Compared to Duo this is a positively archaic solution.

Yet another service provider I interact with regularly recently launched (and is pestering me to sign up for) two factor authentication - they too use these old fashioned codes. I've been hit with more two factor related things in the past month than in the past probably 5 years or something.

preactivate-duo-mobile-san

Sync your phone with Duo security by scanning a QR code with your phone (obscured the QR code a bit just in case that has sensitive info in it)

By contrast the self enrollment in Duo is simple, requires no interaction on my part, users can enroll whenever they want. They can even register multiple devices on their own, and add/delete devices if they wish.

One of the times during testing I did have an issue scanning the QR code, which normally takes about 2 seconds on my phone. I was struggling with it for a minute or two, until I realized my mouse cursor was on top of it, which was blocking the scan from working. Maybe they could improve it by somehow cloaking the mouse cursor with javascript or something if it goes over the code, I don't know.

Don't have a mobile app? Duo can use those same old fashioned key codes too(by their or 3rd party keyfob or mobile app), or they can send you a SMS message, or make a voice call to you (the prompt basically says hit any button on the touch tone phone to acknowledge the 2nd factor -- of course that phone# has to be registered with them).

Simply press a button to acknowledge 2nd factor

The other easy part is there is of course no codes to have to transcribe from a device to the computer. If you are using the mobile app, upon login you get a push notification from the app (in my experience more often than not this comes in less than 2 seconds after I try to login). The app doesn't have to be running (it runs in the background even if you reboot your phone). I get a notification in Android (in my case) that looks like this:

duo-android-sanDuo integrated nicely into Android

I obscured the IP address and the company name just to try to keep this not associated with the company I work for. If you have the app running in the foreground you can see a full screen login request similar to the smaller one above. If for some reason you are not getting the push notification you can use tell the app to poll the Duo service for any pending notifications(only had to do that once so far).

The mobile app also has one of those number generator things so you can use that in the event you don't have a data connection on the phone. In the event the Duo service is off line you have the option of disabling 2nd factor automatically(default) so them being down doesn't stop you from getting access, or if you prefer ultra security you can tell the system to prevent any users from logging in if the 2nd factor is not available.

Normally I am not one for SaaS type stuff - really the only exception is if the SaaS provides something that I can't provide myself. In this case the simple two factor stuff, the self enrollment, the ability to support SMS and phone voice calls(of which about a half dozen of my users have opted to use) is not anything I could of setup in a short time frame anyway (our PCI consultants were not aware of any comparable solution - and they had not worked with Duo before).

Duo claims to be able to setup in just a few minutes - for me the reality was a little different, the instructions they had were only half what I needed for our main SSL VPN, I had to resort to instructions from our VPN appliance maker to make up the difference (and even then I was really confused, until support explained it to me. Their instructions were specifically for two factor on IOS devices though applied to my scenario as well). For us the requirement is that the VPN device talk to BOTH LDAP and Radius. LDAP stores the groups that users belong to, and those groups determine what level of network access they get. Radius is the 2nd factor(or in the case of our IPSec VPN the first factor too more on that in a moment). In the end it took me probably 2-3 hours to figure it out, about half of that was wondering why I couldn't login(because I hadn't setup the VPN->LDAP link so the authentication wasn't getting my group info so I was not getting any network permissions).

So for our main SSL VPN, I had to configure a primary and a secondary authentication, and initially with Duo I just kept it in pass through mode (only talking to them and not any other authentication source) because the SSL VPN was doing the password auth via LDAP.

When I went to hook up our IPSec VPN that was a different configuration, that did not support dual auth of both LDAP and Radius, it could do LDAP group lookups and password auth with radius though.  So I put the Duo proxy in a more normal configuration which meant I needed another Radius server that was integrated with our LDAP(which runs on the same VM as the Duo proxy on a different port) that the Duo proxy could talk to(talks to localhost) in order to authenticate passwords. So the IPSec VPN would send a radius request to the Duo proxy which would then send that information to another Radius (integrated with LDAP) and to their SaaS platform, and give a single response back to allow or deny the user.

At the end of the day the SSL VPN ends up authenticating the user's password twice (once via LDAP once via RADIUS), but other than being redundant there is no harm.

Here is what the basic architecture looks like, this graphic is more ugly than my official one since I wanted to hide some of the details, you can get the gist of it though

DuoSecurity_Deployment_sanity

Two factor authentication for SSL, IPSec and SSH with redundancy

The SSL VPN supported redundant authentication schemes, so if one Duo proxy was down it would fail back to another one, the problem was the timeout was too long, it would take upwards of 3 minutes to login(and you are in danger of the login timing out). So I setup a pair of Duo proxies and am load balancing between them with a layer 7 health check. If a failure occurs there is no delay in login and it just works better.

As the image shows I have integrated SSH logins with Duo as well in a couple of cases, there is no inline pretty self enrollment, but if you happen to not be enrolled, the two factor process with spit out a url to put into your browser upon first login to the SSH host to enroll in two factor.

I deployed the setup to roughly 120 users a few weeks ago, and within a few days roughly 50-60 users had signed up. Internal IT said there were zero - count 'em zero - help desk tickets related to the new system, it was that easy and functional to use. My biggest concern going into this whole project was tight timelines and really no time for any sort of training. Duo security made that possible (even without those timelines I still would of preferred this solution -- or at least this type of solution assuming there is something else similar on the market I am not aware of any).

My only support tickets to-date with them were two users who needed to re-register their devices(because they got new devices). Currently we are on the cheaper of the two plans which does not allow self management of devices. So I just login to the Duo admin portal, delete their phone and they can re-enroll at their leisure.

Duo's plans start as low as $1/user/month. They have a $3/user/month enterprise package which gives more features. They also have an API package for service providers and stuff which I think is $3/user/year (with a minimum number of users).

I am not affiliated with Duo in any way, not compensated by them, not bribed not given any fancy discounts.. but given I have written brief emails to the two companies that have recently deployed two factor I thought I would write this so I could point them and others to my story here to get more insight on a better way to do two factor authentication.

17Sep/14Off

NetApp Flash ray ships… with one controller

TechOps Guy: Nate

Well I suppose it is finally out, or at least in a "limited" way. NetApp apparently is releasing their ground-up rewrite all Flash product Flash Ray, based on a new "MARS" operating system (not related to Ontap).

When I first heard about MARS I heard some promising things, I suppose all of those things were just part of the vision, obviously not where the product is today on launch day. NetApp has been carefully walking back expectations all year. Which turned out to be a smart move, but it seems they didn't go far enough.

To me it is obvious that they felt severe market pressures and could no longer risk not going to market without their next gen platform available. It's also obvious that Ontap doesn't cut it for flash or they wouldn't of built Flash Ray to begin with.

But shipping a system that only supports a single controller I don't care if it's a controlled release or not - giving any customer such a system under any circumstance other than alpha-quality testing just seems absurd.

The "vision" they have is still a good one, on paper anyway -- I'm really curious how long it takes them to execute on that vision -- given the time it took to integrate the Spinmaker stuff into Ontap. Will it take several years?

In the meantime while your waiting for this vision to come out I wonder what NetApp will offer to get people to want to use this product vs any one of the competing solutions out there. Perhaps by the time this vision is complete this first or second generation of systems will be obsolete anyway.

Current FlashRay system seems to ship with less than 10TB of usable flash (in one system).

On a side note there was some chatter recently about a upcoming EMC XtremIO software update that apparently requires total data loss (or backup & restore) to perform. I suppose that is a sign that the platform is 1) not mature and 2) not designed right(not fully virtualized).

I told 3PAR management back at HP Discover - three years ago they could of counted me as among the people who did not believe 3PAR architecture would be able to adapt to this new era of all flash. I really didn't have confidence at that time. What they've managed to accomplish over the past two years though has just blown me away, and gives me confidence their architecture has many years of life left to it. The main bit missing still is compression - though that is coming.

My new all flash array is of course a 7450 - to start with 4 controllers and ~27TB raw flash (16x1.92TB SSDs), a pair of disk shelves so I can go to as much as ~180TB raw flash (in 8U) without adding any shelves (before compression/dedupe of course). Cost per GB is obviously low(relative to their competition), performance is high(~105k IOPS @ 90% write in RAID 10 @ sub 1ms latency - roughly 20 fold faster than our existing 3PAR F200 with 80x15k RPM in RAID 5 -- yes my workloads are over 90% write from a storage perspective), and they have the mature, battle hardened 3PAR OS (used to be named InformOS) running on it.

Tagged as: , , 1 Comment
19Aug/14Off

Sprint screwing their subscribers again

TechOps Guy: Nate

As a former Sprint customer for more than a decade I though this was interesting news.

My last post about Sprint was appropriately titled "Can Sprint do anything else to drive me away as a customer". I left Sprint less because I did not like them/service/etc and really more because I wanted to use the HP Pre 3 which was GSM, which meant AT&T (technically could of used T-Mobile but the Pre 3 didn't support all of T-mobile's 3G frequencies which meant degraded service coverage). So I was leaving Sprint regardless but they certainly didn't say or do anything that made me want to second guess that decision.

Anyway, today Sprint announces a big new fancy family plan that is better than the competition.

Except there is one glaring problem with this plan

[..]you’ll have to sign-up between Aug. 22 and Sept. 30, and current subscribers cannot apply.

Yeah, Sprint loves their customers.

On that note I thought this comment was quite interesting on El Reg:

[..]They combine Verizon-level arrogance with truly breath-taking incompetence into one slimy package. Their network stinks, it's the slowest of the Big Four (and not by a small margin, either), their customer service makes Comcast look good[..]

 

Tagged as: No Comments
16Aug/14Off

Blog spam stats

TechOps Guy: Nate

I just upgraded my Akismet plugin for the first time in a long time and this version gives me all sorts of fun stats about the spam that comes through here (they don't count my posts as SPAM but maybe they should consider that).

Anyway, the first one was somewhat startling to me, perhaps it shouldn't be but it was anyway, I had to go back and look when I told wordpress to close comments off on posts older than 90 days (that was done entirely to limit impact of spam see side bar I have a note about re-opening comments if you wish to comment on an older post for a temporary amount of time.

So fortunately my apache logs go back to December 19 2013 as when I did this. Behold the impact!

Impact of disabling comments on posts older than 90 days

Impact of disabling comments on posts older than 90 days

The last 5 months of 2013 generated 97,055 spam, vs the first 8 months(so far) of 2014 has generated 6,360 spam (not even as much as August 2013 alone).

Next up is the all time spam history, which just goes back to 2012, I guess they were not collecting specifics on stats before that I have been a subscriber to this service for longer than that for sure.

TechOpsGuys spam all time

TechOpsGuys spam all time

I've never really managed spam here, I rarely look at what is being blocked well there is so much(even now).

12Aug/14Off

Some internet routers ran out of memory today

TechOps Guy: Nate

(here is a link to in depth analysis on the issue)

Fortunately I didn't notice any direct impact to anything I personally use. But I first got notification from one of the data center providers we use that they were having network problems they traced it down to memory errors and they frantically started planning for emergency memory upgrades across their facilities. My company does not and has never relied upon this data center for network connectivity so it never impacted us.

A short time later I noticed a new monitoring service that I am using sent out an outage email saying their service providers were having problems early this morning and they had migrated customers away from the affected data center(s).

Then I contacted one of the readers of my blog whom I met a few months ago and told him the story of my data center that is having this issue which sounded similar to a story he told me at the time about his data center provider. He replied with a link to this Reddit article which talks about how the internet routing table exceeded 512,000 routes for the first time today, and that is a hard limit in some older equipment which causes them to either fail, or to perform really slowly as some routes have to be processed in software instead of hardware.

I also came across this article (which I commented on) which mentions similar problems but no reference to BGP or routing tables (outside my comments at the bottom).

[..]as part of a widespread issue impacting major network providers including Comcast, AT&T, Time Warner and Verizon.

One of my co-workers said he was just poking around and could find no references to what has been going on today other than the aforementioned Reddit article. I too am surprised if so many providers are having issues that this hasn't made more news.

(UPDATE - here is another article from zdnet)

I looked at the BGP routing capacity of some core switches I had literally a decade ago and they could scale up to 1 million unique routes of BGP4 routes in hardware, and 2 million non unique (not quite sure what the difference is anything beyond static routing has never been my thing). I recall seeing routers again many years ago that could hold probably 10 times that (I think the main distinction between a switch and a router is the CPU and memory capacity ? at least for the bigger boxes with dozens to hundreds of ports?)

So it's honestly puzzling to me how any service provider could be impacted by this today. How any equipment not capable of handling 512k routes is still in use in 2014 (I can understand for smaller orgs but not service providers). I suppose this also goes to show that there is wide spread lack of monitoring of these sorts of metrics. In the Reddit article there is mention of talks going on for months people knew this was coming -- well apparently not everyone obviously.

Someone wasn't watching the graphs.

I'm planning on writing a blog post on the aforementioned monitoring service I recently started using soon too, I've literally spent probably five thousand hours over the past 15 years doing custom monitoring stuff and this thing just makes me want to cry it's so amazingly powerful and easy to use. In fact just yesterday I had someone email me about a MRTG document I wrote 12 years ago and how it's still listed on the MRTG site even today (I asked the author to remove the link more than a year ago that was the last time someone asked me about it, that site has been offline for 10 years but is still available in the internet archive).

This post was just a quickie inspired by my co-worker who said he couldn't find any info on this topic, so hey maybe I'm among the first to write about it.

19Jun/14Off

My longest road trip to-date

TechOps Guy: Nate

I got back from the longest road trip I've ever personally driven anyway to-date on Tuesday.

Pictures in case your interested, I managed to cut them down to roughly 600:

 

Long road trip June 2014 - 2,900 miles total

Long road trip June 2014 - 2,900 miles total

California

I decided to take the scenic route and went through Yosemite on the way over, specifically to see Glacier Point, a place I wasn't aware of and did not visit on my last trip through Yosemite last year. I ended up leaving too late so managed to get to Glacier point and take some good pictures, though by the time I was back on the normal route it was pretty much too dark to take pictures of anything else in Yosemite. I sped over towards Tonopah, NV for my first night's stay before heading to Vegas the next day. That was a fun route, at least once I got near the Nevada border at that time of night I didn't see anyone on the road for a good 30-40 or more miles (had to slow down on some areas of the road I was getting too much air! literally!). Though I did encounter some wildlife playing in the road, fortunately managed to avoid casualties.

Las Vegas area

I took a ferry tour on Lake Mead, that was pretty neat (was going to say cool but damn was it hot as hell there my phone claimed 120 degrees from it's ambient temp sensor, car said it was 100). That ferry is the largest boat on the lake by far, and there wasn't many people on there for that particular tour on that day, maybe 40 or so out of probably 250-300 that it can hold. I was surprised given the gajillions of gallons of water right there that the surrounding area was still so dry and barren, so the pictures I took weren't as good as I thought they might of been otherwise.

I went to the Hoover dam for a few minutes at least, couldn't go inside as I had my laptop bag with me(wasn't checked into hotel yet) and they wouldn't let me in with the bag, and I wasn't going to leave it in my car!

HP Discover

(you can see all of my Discover related posts here)

A decent chunk of it was in Las Vegas at HP Discover where I am grateful for the wonderful folks over there which really made the time quite pleasant.

I probably wouldn't attend an event like Discover even though I know people at HP if it weren't for the more personalized experience that we got. I don't like to wander around show floors and go into fixed sessions, I have never gotten anything out of that sort of thing.

Being able to talk in a somewhat more private setting in a room on the show floor with various groups was helpful. I didn't learn too much new things, but was able to confirm several ideas I already had in my head.

I did meet David Scott, head of HP storage for the first time, and ran into him again at the big HP party and he came over and chatted with Calvin Zito and myself for a good 30 minutes. He's quite a guy I was very impressed. I thought it was funny how he poked fun at the storage startups during the 3PAR announcements. It was very interesting to hear his thoughts on some topics. Apparently he reads most/all of my blog posts and my comments on The Register too.

We also went up on the High Roller at night which was nice, though couldn't take any good pictures, was too dark  and most things just ended up blurry.

All in all it was a good time, met some cool people, had some good discussions.

Arizona

I was in the neighborhood, so I decided to check out Arizona again, maybe for the last time. I was there a couple of times in the past to visit a friend who lived in the Tuscon area but he moved away early this year. I did plan to visit Sedona the last time I was in AZ, but decided to skip it in favor of NFL playoffs.  So I went to AZ again in part to visit Sedona which I had heard was pretty.

Part of the expected route to Sedona was closed off due to the recent fire(s), so I had to take a longer way around.

I also decided to visit the Grand Canyon (north end), and was expecting to visit the south end the same day but that food poisoning hit me pretty good right about the time I got to the north end, so I was only there about 45 minutes and I had to go straight back to the hotel (~200 miles away). I still managed to get some good pictures though. There is a little trail that goes out to the edge there, though for the most part had no hand rails, was pretty scary to me anyway being so close to a very big drop off.

Food poisoning settled down by Monday morning and I was able to get out and about after my company asked me to extend my stay to support a big launch (which turned out to be nothing fortunately) and visit more places before I headed back early Tuesday morning. I went through Vegas again and made a couple pit stops before making the long trek back home.

Was a pretty productive trip though got quite a bit accomplished I suppose. One thing I wanted to do is get a picture of my car next to a "Welcome to Sedona" sign to send to one of my former bosses. There was a "secret" project at that company to move out of a public cloud and it was so controversial that my boss gave it a code name of Sedona so we wouldn't upset people in earlier days of the project. So I sent him that pic and he liked it :)

Car's trip meter - need some color on this blog

Car's trip meter - need some color on this blog (yes that is almost 60 hours of driving over 10 days)

One concern I had on my trip is my car has a time bomb ticking waiting for the engine to explode. I've been planning on getting that fixed the next time I am in Seattle, I think I am still safe for the time being given the mileage. The dealership closest to me is really bad (and I complained loudly about them to Nissan) so I will not go there, and the next closest is pretty far away, the operation to repair the problem is a 4-5 hour one and I don't want to stick around. Besides I really love the service department at the dealership that I bought my car at, and I'll be back in that area soon enough anyway (for a visit).

 

Tagged as: 4 Comments
15Jun/14Off

HP Discover 2014: Datacenter services

TechOps Guy: Nate

(Standard disclaimer HP covered my hotel and stuff while in Vegas etc etc...)

I should be out sight seeing but have been stuck in my hotel room here in Sedona, AZ due to the worst food poisoning I've ever had from food I ate on Friday night.

X As a service

The trend towards "as a service" as what seems to be an accounting thing more than anything else to shift dollars to another column in the books continues with HP's Facility as a service.

HP will go so far as to buy you a data center(the actual building), fill it with equipment and rent it back to you for some set fee - with entry level systems starting at 150kW (which would be as few as say 15 x high density racks). They can even manage it end to end if you want them to. I didn't realize myself the extent that their services go to. requires a 5 or 10 year commitment however (has to do with accounting again I believe). HP says they are getting a lot of positive feedback on this new service.

This is really targeted at those that must operate on premise due to regulations and cannot rely on a 3rd party data center provider (colo).

Flexible capacity

FAAS doesn't cover the actual computer equipment though, that is just the building, power, cooling etc. The equipment can either come from you or you can get it from HP using their Flexible Capacity program. This program also extends to the HP public cloud as well as a resource pool for systems.

HP Flexible Capacity program

HP Flexible Capacity program

Entry level for Flexible capacity we were told was roughly a $500k contract ($100k/year).

Thought this was a good quote

"We have designed more than 65 million square feet of data center space. We are responsible for more than two-thirds of all LEED Gold and Platinum certified data centers, and we’ve put our years of practical experience to work helping many enterprises successfully implement their data center programs. Now we can do the same for you."

Myself I had no idea that was the case, not even close.

15Jun/14Off

HP Discover 2014: Software defined

TechOps Guy: Nate

(Standard disclaimer HP covered my hotel and stuff while in Vegas etc etc...)

I have tried to be a vocal critic of the whole software defined movement, in that much of it is hype today and has been for a while and will likely to continue to be for a while yet. My gripe is not so much about the approach, the world of "software defined" sounds pretty neat, my gripe is about the marketing behind it that tries to claim we're already there, and we are not, not even close.

I was able to vent a bit with the HP team(s) on the topic and they acknowledged that we are not there yet either. There is a vision, and there is a technology. But there aren't a lot of products yet, at least not a lot of promising products.

Software defined networking is perhaps one of the more (if not the most) mature platforms to look at. Last year I ripped pretty good into the whole idea with good points I thought, basically that technology solves a problem I do not have and have never had. I believe most organizations do not have a need for it either (outside of very large enterprises and service providers). See the link for a very in depth 4,000+ word argument on SDN.

More recently HP tried to hop on the bandwagon of Software Defined Storage, which in their view is basically the StoreVirtual VSA. A product that to me doesn't fit the scope of Software defined, it is just a brand  propped up onto a product that was already pretty old and already running in a VM.

Speaking of which, HP considers this VSA along with their ConvergedSystem 300 to be "hyper converged", and least the people we spoke to do not see a reason to acquire the likes of Simplivity or Nutanix (why are those names so hard to remember the spelling..). HP says most of the deals Nutanix wins are small VDI installations and aren't seen as a threat, HP would rather go after the VCEs of the world. I believe Simplivity is significantly smaller.

I've never been a big fan of StoreVirtual myself, it seems like a decent product, but not something I get too excited about. The solutions that these new hyper converged startups offer sound compelling on paper at least for lower end of the market.

The future is software defined

The future is not here yet.

It's going to be another 3-5 years (perhaps more). In the mean time customers will get drip fed the technology in products from various vendors that can do software defined in a fairly limited way (relative to the grand vision anyway).

When hiring for a network engineer, many customers would rather opt to hire someone who has a few years of python experience than more years of networking experience because that is where they see the future in 3-5 years time.

My push back to HP on that particular quote (not quoted precisely) is that level of sophistication is very hard (and expensive) to hire for. A good comparative mark is hiring for something like Hadoop.  It is very difficult to compete with the compensation packages of the largest companies offering $30-50k+ more than smaller (even billion $) companies.

So my point is the industry needs to move beyond the technology and into products. Having a requirement of knowing how to code is a sign of an immature product. Coding is great for extending functionality, but need not be a requirement for the basics.

HP seemed to agree with this, and believes we are on that track but it will take a few more years at least for the products to (fully) materialize.

HP Oneview

(here is the quick video they showed at Discover)

I'll start off by saying I've never really seriously used any of HP's management platforms(or anyone else's for that matter). All I know is that they(in general not HP specific) seem to be continuing to proliferate and fragment.

HP Oneview 1.1 is a product that builds on this promise of software defined. In the past five years of HP pitching converged systems seeing the demo for Oneview was the first time I've ever shown just a little bit of interest in converged.

HP Oneview was released last October I believe and HP claims something along the lines of 15,000 downloads or installations. Version 1.10 was announced at Discover which offers some new integration points including:

  • Automated storage provisioning and attachment to server profiles for 3PAR StoreServ Storage in traditional Fibre Channel SAN fabrics, and Direct Connect (FlatSAN) architectures.
  • Automated carving of 3PAR StoreServ volumes and zoning the SAN fabric on the fly, and attaching of volumes to server profiles.
  • Improved support for Flexfabric modules
  • Hyper-V appliance support
  • Integration with MS System Center
  • Integration with VMware vCenter Ops manager
  • Integration with Red Hat RHEV
  • Similar APIs to HP CloudSystem

Oneview is meant to be light weight, and act as a sort of proxy into other tools, such as Brocade's SAN manager in the case of Fibre channel (myself I prefer Qlogic management but I know Qlogic is getting out of the switch business). Though for several HP products such as 3PAR and Bladesystem Oneview seems to talk to them directly.

Oneview aims to provide a view that starts at the data center level and can drill all the way down to individual servers, chassis, and network ports.

However the product is obviously still in it's early stages - it currently only supports HP's Gen8 DL systems (G7 and Gen8 BL), HP is thinking about adding support for older generations but their tone made me think they will drag their feet long enough that it's no longer demanded by customers. Myself the bulk of what I have in my environment today is G7, only recently deployed a few Gen8 systems two months ago. Also all of my SAN switches are Qlogic (and I don't use HP networking now) so Oneview functionality would be severely crippled if I were to try to use it today.

The product on the surface does show a lot of promise though, there is a 3 minute video introduction here.

HP pointed out you would not manage your cloud from this, but instead the other way around, cloud management platforms would leverage Oneview APIs to bring that functionality to the management platform higher up in the stack.

HP has renamed their Insight Control systems for vCenter and MS System Center to Oneview.

The goal of Oneview is automation that is reliable and repeatable. As with any such tools it seems like you'll have to work within it's constraints and go around it when it doesn't do the job.

"If you fancy being able to deploy an ESX cluster in 30 minutes or less on HP Proliant Gen8 systems, HP networking and 3PAR storage than this may be the tool for you." - me

The user interface seems quite modern and slick.

They expose a lot of functionality in an easy to use way but one thing that struck me watching a couple of their videos is it can still be made a lot simpler - there is a lot of jumping around to do different tasks.  I suppose one way to address this might be broader wizards that cover multiple tasks in the order they should be done in or something.