TechOpsGuys.com Diggin' technology every day

August 7, 2013

Nth Symposium 2013 Keynote: SDN

Filed under: Networking — Tags: , — Nate @ 9:11 am

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

“So, SDN solves a problem for me which doesn’t exist, and never has.”

– Nate (techopsguys.com)

(I think the above quote sums up my thoughts very well so I put it in at the top, it’s also buried down below too)

One of the keynotes of the Nth Generation Symposium last week was from Martin Casado, who is currently a Chief Architect at VMware, and one of the inventors of OpenFlow and the SDN concept in general.

I have read bits and pieces about what Martin has said in the past, he seems like a really smart guy and his keynote was quite good. It was nice to hear confirmation from him many of the feelings I have on SDN in general. There are some areas that I disagree with him on, that is mainly based on my own personal experience in environments I have worked in – the differences are minor, my bigger beef with SDN is not even inside the scope of SDN itself, more on that in a bit.

First off, I was not aware that the term Software Defined Networking was created on the spot by some reporter of the MIT Technology Review. Apparently this reporter who was interviewing Martin had just done an article on Software Defined Radio, the reporter asked Martin what should they call this thing he created? He didn’t know, so the reporter suggested Software Defined Networking since that term was still fresh in the reporter’s head. He agreed and the term was born..

Ripping from one of his slides:

What does SDN Promise?

  • Enable rapid innovation in Networking
  • Enable new forms of network control
  • It’s a mechanism for implementers
  • Not a solution for customers

That last bit I did not notice until a few moments ago, that is great to see as well.

He says network virtualization is all about operational simplification

Martin's view of Network Virtualization

Martin’s view of Network Virtualization

What Network Virtualization is

  • Decoupling of the services provided by a virtualized network from the physical network
  • Virtual network is a container of network services (L2-L7) provisioned by software
  • Faithful reproduction of services provided by physical network

He showed an interesting stat claiming that half of all server access ports are already virtualized, and we’re on track to get to 67% in 2 years. Also apparently 40% of virtualization admins also manage virtual switching.

Here is an interesting slide showing a somewhat complex physical network design and how that can be adapted to be something more flexible with SDN and network virtualization:

The migration of physical to virtual

The migration of physical to virtual

Top three reasons for deploying software defined networks

  1. Speed
  2. Speed
  3. Speed

(from another one of Martin’s slides – and yes he had #1,#2,#3 as all the same anything beyond speed was viewed as a distant reason relative to speed)

Where I stand on Martin’s stuff

So first off let me preface this as I am a customer. I have managed L2-L7 networks off and on for the past 12 years now, on top of all of my other stuff. I have designed and built from the ground up a few networks. Networking has never been my primary career path. I couldn’t tear apart an IP packet and understand it if my life depended on it. That being said I have been able to go toe to toe with every “Network Engineer” I have worked with(on almost everything except analyzing packet dumps beyond the most basic of things). I don’t know if that says something about me, or them, or both.

I have worked in what you might consider nothing but “web 2.0” stuff for the past decade. I have never had to support big legacy applications, everything has been modern web based stuff. In two cases it was a three tier application (web+app+db) the others were two tier. I have supported Java, PHP, Ruby and Perl apps (always on Linux).

None of the applications I supported were “web scale” (and I will argue till I am blue in the face that most(99%) organizations will never get to web scale). The biggest scaling application was at the same time my first application – I calculated the infrastructure growth as 1,500%(based on raw CPU capacity) over roughly 3 years – to think the ~30 racks of servers could today fit into a single blade enclosure with room to spare..

What does SDN solve?

Going briefly to another keynote by someone at Intel they had this slide, which goes to show some of the pain they have –

Intel's network folks take 2-3 weeks to provision a service

Intel’s network folks take 2-3 weeks to provision a service

Intel’s own internal IT estimates say it takes them 2-3 weeks to provision a new service. This makes really no sense to me, but there is no description as to what is involved with configuring a new service.

So going back to SDN. From what I read, SDN operates primarily at L2-L3. The firewalls/load balancers etc are less SDN and more network virtualization and seem to be outside the scope of core SDN (OpenFlow). To-date I have not seen a single mention of the term SDN when it comes to these services from any organization. It’s all happening at the switch/routing layer.

So I have to assume here for a moment that it takes Intel 2-3 weeks to provision new VLANs, perhaps deploy some new switches, or update some routes or something like that (they must use Cisco if it takes that long!).

My own network designs

Going to my own personal experience – keeping things simple.  Here is a sample network design of mine that is recent:

Basic Network Zoning architecture

Basic Network Zoning architecture

There is one major zone for the data center itself, which is a /16(levering Extreme’s Layer 3 Virtual switching), then within that, at the moment are three smaller zones (I think supernet may be the right word to describe them), and within those supernets are sub zones (aka subnets aka VLANs). A couple of different sizes for different purposes. Some of the sub zones have jumbo frames enabled, most do not. There is a dedicated sub zone for Vmotion(this VLAN has no router interface on it, in part for improved security perhaps), infrastructure management interfaces, etc. Each zone (A-C) has a sub zone dedicated to load balancer virtual IPs for internal load balancing. The load balancer is directly connected to all of the major zones. Routing to this data center (over VPN – either site to site, or end user VPN) is handled by a simple /16 route, and individual WAN-based ACLs are handled by the VPN appliance.

There are a few misc zones in the middle for various purposes, these have no access restrictions on them at all. Well the VPN client stuff, the ACLs for those are handled by the VPN appliance, not by the rest of the network.

This specific network design is not meant to be extremely high security as that need does not exist in this organization (realistic need, I have seen on several occasions network engineers over engineer something for security when it really was not required and as a result introduce massive bottlenecks into the network – this became an even greater concern for me with all servers running with multiple 10GbE links). The access controls are mainly to protect casual mistakes. Internet facing services in all zones have the same level of security, so if you happen to be able to exploit one of them(I’ve never seen this happen at any company on anything I’ve been responsible for – not that I go to paranoid lengths to secure things either), there’s nothing stopping you from exploiting the others in the exact same way. Obviously nothing is directly connected to the internet other than the load balancer(which runs a hardened operating system), and a site to site VPN appliance(also hardened).

The switch blocks TCP SYN  & UDP packets between the respective zones above, since it is not stateful. The switch operates at line rate 10GbE w/ASIC-based ACLs, and performing this function in a hardware (or software) firewall I figured would be too much complexity and reduce performance (not to mention the potential costs of a firewall that is capable of line rate 10Gbps+ – given multiple servers each with multiple 10GbE ports – the possibility exists of throughput far exceeding that of 10Gbps – with the switch it is line rate on every port – up to 1.2Tbps on this switching platform – how much is that firewall again?).

There are four more VLANs related to IP-based storage- two for production and two for non production though they have never really been used to-date. I have the 3PAR iSCSI on these VLANs, with jumbo frames(the purpose of the VLANs), though all of the client systems at the moment use standard frame sizes (iSCSI runs on top of TCP which provides MTU auto negotiation).

There is a pair of hardware load balancers, each has a half dozen or so VLANs, each zone has a dedicated load balancer VLAN for that zone, for services in that zone. The LBs are also the connected to the internet of course, in a two-armed configuration.

Sample two-arm configuration for a LB

Sample two-arm configuration for a LB from Citrix documentation

I have a similar configuration in another data center using a software load balancer of the same type – however the inability to support more than 4 NICs (4 VLANs at least in vSphere 4.1 – not sure if this is increased in 5.x) limits the flexibility of that configuration relative to the physical appliances, so I had to make a few compromises in the virtual’s case.

So I have all these VLANs, a fully routed layer 3 switching configuration, some really basic ACLs to prevent certain types of communication, load balancers to route traffic from the internet as well as distribute load in some cases.

Get to the point already!

The point of all of this is things were designed up front, provisioned up front, and as a result over the past 18 months we have not had to make any changes to this configuration despite  more than doubling in size during that time. We could double again and not have a problem. Doubling again beyond that I may need to add one or two VLANs (sub zones), though I believe the zones as they exist today could continue to exist, I would not have to expand them. I really do not think the organization running this will ever EVER get to that scale. If they do then they’re doing  many billions in revenue a year and we can adapt the system if needed(and probably at that point we’d have one or more dedicated network engineers who’d likely promptly replace whatever I have built with something significantly more(overly so) complicated because they can).

If we are deploying a new application, or a new environment we just tell VMware where to plop the VM. If it is QA/Dev then it goes in that zone, if it is testing, it goes in another, production etc.. blah blah…

More complexity outside switching+routing

The complexity when deploying a new network service really lies in the load balancer from an network infrastructure perspective. Not that it is complicated but that stuff is not pre-provisioned up front. Tasks include:

  • Configuring server name to IP mappings (within the LB itself)
  • Creating Service group(s) & adding servers to the service groups
  • Creating virtual server(s) & assigning IPs + DNS names to them
  • Creating content switching virtual server(s) & assigning IPs + DNS names to them
  • Configuring content switching virtual server(s) – (adding rules to parse HTTP headers and route traffic accordingly)
  • Importing SSL cert(s) & assigning them to the virtual servers & cs virtual servers

The above usually takes me maybe 5-20 minutes depending on the number of things I am adding. Some of it I may do via GUI, some I may do via CLI.

None of this stuff is generic, unless we know specifically what is coming we can’t provision that in advance(I’m a strong believer in solid naming conventions – which means no random names!!!).

The VMs by contrast are always very generic(other than the names of course), there’s nothing special to them, drop them in the VLAN they need to be and they are done – we have no VMs that I can think of that have more than one vNIC other than the aforementioned software load balancers. Long gone are the days (for me) where a server was bridged between two different networks – that’s what routers are for.

Network is not the bottleneck for deploying a new application

In fact in my opinion the most difficult process of getting a new application up and running is getting the configuration into Chef. That is by far the longest part of any aspect of the provisioning process. It can take me, or even us, hours to days to get it properly configured and tested. VMs take minutes, load balancer takes minutes. Obviously a tool like Chef makes it much easier to scale an existing application since the configuration is already done. This blog post is all about new applications or network services.

Some of the above could be automated with using the APIs on the platform(they’ve been there for years), and some sort of dynamic DNS or whatever. The amount of work involved to build such a system for an operation of our scale isn’t worth the investment.

The point here is, the L2/L3 stuff is trivial – at least for an operation that we run at today – and that goes for all of the companies I have worked at for the past decade. The L2/L3 stuff flat out doesn’t change very often and doesn’t need to. Sometimes if there are firewalls involved perhaps some new holes need to be poked in them but that just takes a few minutes — and from what I can tell is outside the scope of SDN anyway.

I asked Martin a question on that specific topic. It wasn’t well worded but he got the gist of it. My pain when it comes to networking is not the L2/L3 area – it is the L7 area. Well if we made extensive use of firewalls than L3 fire-walling would be an issue as well. So I asked him how SDN addresses that(or does it). He liked the question and confirmed that SDN does not in fact address that. That area should be addressed by a “policy management tool” of some kind.

I really liked his answer – it just confirms my thoughts on SDN are correct.

Virtual Network limitations

I do like the option of being able to have virtual network services, whether it is a load balancer or firewall or something. But those do have limitations that need to be accounted for. Whether it is performance, flexibility (# of VLANs etc), as well as dependency (you may not want to have your VPN device in a VM if your storage shits itself you may lose VPN too!). Managing 30 different load balancers may in fact be significantly more work(I’d wager it is- the one exception is service provider model where you are delegating administrative control to others – which still means more work is involved it is just being handled by more staff) than managing a single load balancer that supports 30 applications.

Citrix Netscaler Traffic Flow

Citrix Netscaler Cluster Traffic Flow

Above is a diagram from Citrix from an earlier blog post I wrote about last year. At the time their clustering tech scaled to 32 systems, which if that still holds true today, at the top end @ 120Gbps/system that’d be nearly 4Tbps of theoretical throughput. Maybe cut that in half to be on the safe side, so roughly 2Tbps..that is quite a bit.

Purpose built hardware network devices have long provided really good performance and flexibility. Some of them even provide some layer of virtualization built in. This is pretty common in firewalls. More than one load balancing company has appliances that can run multiple instances of their software as well in the event that is needed. I think the instances that would be required (outside of a service provider giving each customer their own LB) is quite limited.

Design the network so when you need such network resources you can route to them easily – it is a network service after all, addressable via the network – it doesn’t matter if it lives in a VM or on a physical appliance.

VXLAN

One area that I have not covered with regards to virtualization is something that VXLAN offers, which is make the L2 network more portable between data centers and stuff. This is certainly an attractive feature to have for some customers, especially if perhaps you rely on something like VMware’s SRM to provide fail over.

My own personal experience says VXLAN is not required, nor is SRM. Application configurations for the most part are already in a configuration management platform. Building new resources at a different data center is not difficult (again in my experience most of the applications I have supported this could even be done in advance), in the different IP space and slightly different host names (I leverage the common airportcode.domain for each DC to show where each system is physically located). Replicate the data(use application based replication where available e.g. internal database replication) that is needed(obviously that does not include running VM images) and off you go. Some applications are more complex, most web-era applications are not though.

So, SDN solves a problem for me which doesn’t exist, and never has.

I don’t see it existing in the future for most smaller scale (sub hyper scale) applications unless your network engineers are crazy about over engineering things. I can’t imagine what is involved that takes 2-3 weeks to provision a new network service at Intel. I really can’t.  Other than perhaps procuring new equipment, which can be a problem regardless.

Someone still has to buy the hardware

Which leads me into a little tangent. Just because you have cloud doesn’t mean you automatically have unlimited capacity. Even if your Intel, if someone internally built something on their cloud platform(assuming they have one), and said “I need 100,000 VMs each with 24 CPUs and I plan to drive them to 100% utilization 15 hours a day, even with cloud, I think it is unlikely they have that much capacity provisioned as spare just sitting around(and if they do that is fairly wasteful!).

Someone has to buy and provision the hardware, whether it is in a non cloud setup, or in a cloud setup. Obviously once provisioned into a pool of “cloud” (ugh) it is easier to adapt that system to be used for multiple purposes. But the capacity has to exist, in advance of the service using it. Which means someone is going to spend some $$ and there is going to be some lead time to get the stuff in & set it up. An extreme case for sure, but consider if you need to deploy on the order of 10s of thousands of new servers that lead time may be months, to get the floor space/power/cooling alone.

I remember a story I heard from SAVVIS many years ago, a data center they operated in the Bay Area had a few 10s of thousands of square feet available, and it was growing slow and steady. One day Yahoo! walks in and says I want all of your remaining space. Right Now.  And poof it was given to them. There was a data center company Microsoft bought (forgot who now) and there was one/more facilities up in the Seattle area where (I believe) they kicked out the tenants of the company they bought so they could take over the facility entirely(don’t recall how much time they granted the customers to GTFO but I don’t recall hearing them being polite about it).

So often — practically all the time — when I see people talk about cloud they think that the stuff is magical and no matter how much capacity you need it just takes minutes to be made available (Intel slide above). Now if you are a massive service provider like perhaps Amazon, Microsoft, Google  – you probably do have 100,000 systems available at any given time. Though the costs of public cloud are ..not something I will dive into again in this post, I have talked about that many times in the past.

Back to Martin’s Presentation

Don’t get me wrong — I think Martin is a really smart guy and created a wonderful thing. My issue isn’t with SDN itself, it’s much more with the marketing and press surrounding it, making it sound like everyone needs this stuff! Buy my gear and get SDN!! You can’t build a network today without SDN!! Even my own favorite switching company Extreme Networks can’t stop talking about SDN.

Networking has been boring for a long time, and SDN is giving the industry something exciting to talk about. Except that it’s not exciting – at least not to me, because I don’t need it.

Anyway one of Martin’s last slides is great as well

Markitechture war with SDN

Markitechture war with SDN

Self explanatory, I especially like the SDN/python point.

 

 Conclusion

I see SDN as a great value primarily for service providers and large scale operations at this point. Especially in situations where providers are provisioning dedicated network resources for each customer(network virtualization here works great too).

At some point, perhaps when SDN matures more and it becomes more transparent, then mere mortals will probably find it more useful. As Martin says in one of his first slides, SDN is not for customers(me?), it’s for implementers (that may be me too depending on what he means there, but I think it’s more for the tools builders, people who make things like cloud management interfaces, vCenter programmers etc).

Don’t discount the power/performance benefits of ASICs too much. They exist for a reason, if network manufacturers could build 1U switches to shift 1+Tbps of data around with nothing more than x86 CPUs and have a reasonable power budget I have no doubt they would. Keep this in mind when you think about a network running in software.

If you happen to have a really complicated network then SDN may provide some good value there. I haven’t worked in such an organization, though my first big network(my biggest) was a bit complicated (though it was simpler than the network that it replaced), I learned some good things from that experience and adapted future designs accordingly.

I’ll caveat this all by saying the network design work I have done again has been built for modern web applications, I don’t cover ultra security things like say processing credit cards (that IMO would be a completely physically separate infrastructure for that subsystem to limit the impact of PCI and other compliance things – that being said my first network again did process credit cards directly – this was before PCI compliance existed though, there were critical flaws in the application with regards to credit card security at the time as well). Things are simple, and fairly scalable (not difficult to get to low thousands of systems easily, and that already eclipses the networks of most organizations out there by a big margin).

I believe if your constantly making changes to your underlying L2/L3 network (other than say perhaps adding physical devices to support more capacity) then you probably didn’t design it right to begin with (maybe not your fault). If you need to deploy a new network service, just plug it in and go..

For myself – my role has always been a hybrid of server/storage/network/etc management. So I have visibility into all layers of the application running on the network. So perhaps that makes me better equipped to design things in a way vs. someone who is in a silo and has no idea what the application folks are doing.

Maybe an extreme example but now that I wrote that I remember back many years ago, we had a customer who was a big telco, and their firewall rule change process was once a month a dozen or more people from various organizations(internal+external) get on a conference call to co-ordinate firewall rule changes(and to test connectivity post changes). It was pretty crazy to see. You probably would of had to get the telco’s CEO approval to get a firewall change in that was outside that window!

Before I go let me give a shout out to my favorite L3 switching fault tolerance protocol: ESRP.

I suppose the thing I hesitate most about this post, is paranoid around missing some detail which invalidates every network design I’ve ever done and makes me look like even more of an idiot than I already am!! Though I have talked with enough network people over the years that I don’t believe that will happen…

If you’re reading this and are intimately familiar with an organization that takes 2-3 weeks to spin up a network service I’d like to hear from you (publicly or privately) as to the details around what specifically takes the bulk of that time. Anonymous is fine too, I won’t write anything on it if you don’t want me to. I suspect the bulk of the time is red tape – processes, approvals etc..and not related to the technology.

So, thanks Martin for answering my questions at the conference last week! (I wonder if he will read this…some folks have google alerts for things that are posted and stuff). If you are reading this and you are wondering – yes I really have been a VMware customer for 14 years – going back to pre 1.0 days when I was running VMware on top of Linux. I still have my CD of Vmware 1.0.2 around here somewhere — I think that was the first available physical media distributed. Though my loyalty to VMware has eroded significantly in recent years for various reasons.

August 2, 2013

HP Storage Tech Day – bits and pieces

Filed under: Storage — Tags: , , — Nate @ 9:56 am

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

For my last post on HP Storage tech day, the remaining topics that were only briefly covered at the event.

HP Converged Storage Management

There wasn’t much here other than a promise to build YASMT (Yet another storage management tool), this time it will be really good though. HP sniped at EMC on several occasions for the vapor-ness of ViPR. Though at least that is an announced product with a name. HP has a vision, no finalized name, no product(I’m sure they have something internally) and no dates.

I suppose if your in the Software defined storage camp which is for the separation of data and control plane, this may be HP’s answer to that.

HP Converged Storage Management Strategy

HP Converged Storage Management Strategy

The vision sounds good as always, time will tell if they can pull it off. The track record for products like this is not good. More often than not the tools lower the bar on what is supported to some really basic set of things, and are not able to exploit the more advanced features of the platform under management.

One question I did ask is whether or not they were going to re-write their tools to leverage these new common APIs, and the answer was sort of what I expected – they aren’t. At least short term the tools will use a combination of these new APIs as well as whatever methods they use today. So this implies that only a subset of functionality will be available via the APIs.

In contrast I recall reading something, perhaps a blog post, about how NetApp’s tools use all of their common APIs(I believe end to end API stuff for them is fairly recent). HP may take a couple of years to get to something like that.

HP Openstack Integration

HP is all about the Openstack. They seem to be living and breathing it. This is pretty neat, I think Openstack is a good movement, though the platform still seems some significant work to mature.

I have concerns, short term concerns about HP’s marketing around Openstack and how easy it is to integrate into customer environments. Specifically Openstack is a fast moving target, lacks maturity and at least as recently as earlier this year lacked a decent community of IT users (most of it was centered on developers – probably still is). HP’s response is they are participating deeply within the community (which is good long term), and are being open about everything (also good).

I specifically asked if HP was working with Red Hat to make sure the latest HP contributions (such as 3PAR support, Fibre Channel support) were included in the RH Open Stack. They said no, they are working with the community, and not partners. This is of course good and bad. Good that they are being open, bad in that it may result in some users not getting things for 12-24 months because the distribution of Openstack they chose is too old to support it.

I just hope that Openstack matures enough that it gets a stable set of interfaces. Unlike say the Linux kernel driver interfaces which just annoy the hell out of me(have written about that before). Compatibility people!!!

Openstack Fibre Channel support based on 3PAR

HP wanted to point out that the Fibre Channel support in Openstack was based on 3PAR. It is a generic interface and there are plugins for a few different array types. 3PAR also has iSCSI support for Openstack as of a recent 3PAR software release as well.

StoreVirtual was first Openstack storage platform

Another interesting tidbit is that Storevirtual was the first(?) storage platform to support Openstack. Rackspace used it(maybe still does, not sure), and contributed some stuff to make it better. HP also uses it in their own public cloud(not sure if they mentioned this or not but I heard from a friend who used to work in that group).

HP Storage with Openstack

Today HP integrates with Openstack at the block level on both the StoreVirtual and 3PAR platforms. Work is in progress for StoreAll which will provide file and object storage. Fibre channel support is available on the 3PAR platform only as far as HP goes. StoreVirtual supports Fibre Channel but not with Openstack(yet anyway, I assume support is coming).

This contrasts with the competition, most of whom have no Openstack support and haven’t announced anything to be released anytime soon. HP certainly has a decent lead here, which is nice.

HP Openstack iSCSI/FC driver functionality

All of HP’s storage work with Openstack is based on the Grizzly release which came out around April 2013.

  • Create / Delete / Attach / Detach volumes
  • Create / Delete Snapshots
  • Create volume from snapshot
  • Create cloned volumes
  • Copy image to volume / Copy volume to image (3PAR iSCSI only)

New things coming in Havana release of Openstack from HP Storage

  • Better session management within the HP 3PAR StoreServ Block Storage Drivers
  • Re-use of existing HP 3PAR Host entries
  • Support multiple 3PAR Target ports in HP 3PAR StoreServ Block Storage iSCSI Driver
  • Support Copy Volume To Image & Copy Image To Volume with Fibre Channel Drivers (Brick)
  • Support Quality of Service (QoS) setting in the HP 3PAR StoreServ Block Storage Drivers
  • Support Volume Sets with predefined QoS settings
  • Update the hp3parclient that is part of the Python Standard Library

Fibre channel enhancements for Havana and beyond

Fibre Channel enhancements for Openstack Havana and beyond

Fibre Channel enhancements for Openstack Havana and beyond

Openstack portability

This was not at the Storage Tech Day – but I was at a break out session that talked about HP and Openstack at the conference and one of the key points they hit on was the portable nature of the platform, run it in house, run it in cloud system, run it at service providers and move your workloads between them with the same APIs.

Setting aside a moment the fact that the concept of cloud bursting is a fantasy for 99% of organizations out there(your applications have to be able to cope with it, your not likely going to be able to scale your web farm and burst into a public cloud when those web servers have to hit databases that reside over a WAN connection the latency hit will make for a terrible experience).

Anyway setting that concept aside – you still have a serious problem- short term of compatibility across different Openstack implementations because different vendors are choosing different points to base their systems off of. This is obviously due to the fast moving nature of the platform and when the vendor decides to launch their project.

This should stabilize over time, but I felt the marketing message on this was a nice vision, it just didn’t represent any reality I am aware of today.

HP contrasted this to being locked in to say the vCloud API. I think there are more clouds public and private using vCloud than Openstack at this point. But in any case I believe use cases for the common IT organization to be able to transparently leverage these APIs to burst on any platform- VMware, Openstack, whatever – is still years away from reality.

If you use Openstack’s API, you’re locked into their API anyway. I don’t leverage APIs myself(directly) I am not a developer – so I am not sure how easy it is to move between them. I think the APIs are likely much less of a barrier than the feature set of the underlying cloud in question. Who cares if the API can do X and Y, if the provider’s underlying infrastructure doesn’t yet support that capability.

One use case that could be done today, that HP cited, is running development in a public cloud then pulling that back in house via the APIs. Still that one is not useful either. The amount of work involved in rebuilding such an environment internally should be fairly trivial anyway(the bulk of the work should be in the system configuration area, if your using cloud you should also be using some sort of system management tool, whether it is something like CFEngine, Puppet, Chef, or something similar). That and – this is important in my experience – development environments tend to not be resource intensive. Which makes it great to consolidate them on internal resources (even ones that share with production – I have been doing this since for six years already).

My view on Openstack

At least one person at HP I spoke with believes most stuff will be there by the end of this year but I don’t buy that for a second. I look at things like Red Hat’s own Openstack distribution taking seemingly forever to come out(I believe it’s a few months behind already and I have not seen recent updates on it), and Rackspace abandoning their promise to support 3rd party Open stack clouds earlier this year.

All of what I say is based on what I read — I have no personal experience with Openstack (nor do I plan to get immediate experience, the lack of maturity of the product is keeping me away for now). Based on what I have read, conferences(was at a local Red Hat conference last December where they covered Openstack – that’s when reality really hit me and I learned a good deal about it and honestly lost some enthusiasm in the project) and some folks I have chatted/emailed with Openstack is still a VERY much work in progress, evolving quickly. There’s really no formal support community in place for a stable product, developers are wanting to stay at the leading edge and that’s what they are willing to support. Red Hat is off in one corner trying to stabilize the Folsum release from last year to make a product out of it, HP is in another corner contributing code to the latest versions of Openstack that may or may not be backwards compatible with Red Hat or other implementations.

It’s a mess.. it’s a young project still so it’s sort of to be expected. Though there are a lot of folks making noise about it. The sense I get is if you are serious about running an Open Stack cloud today, as in right now, you best have some decent developers in house to help manage and maintain it. When Red Hat comes out with their product, it may solve a bunch of those issues, but still it’ll be a “1.0”, and there’s always some not insignificant risk to investing in that without a very solid support structure inside your organization (Red Hat will of course provide support but I believe that won’t be enough for most).

That being said it sounds like Openstack has a decent future ahead of it – with such large numbers of industry players adopting support for it, it’s really only a matter of time before it matures and becomes a solid platform for the common IT organization to be able to deploy.

How much time? I’m not sure. My best guesstimate is I hope it can reach that goal within five years. Red Hat, and others should be on versions 3 and perhaps 4 by then. I could see someone such as myself starting to seriously dabble in it in the next 12-16 months.

Understand that I’m setting the bar pretty high here.

Last thoughts on HP Storage Tech Day

I had a good time, and thought it was a great experience. They had very high caliber speakers, were well organized and the venue was awesome as well. I was able to drill them pretty good, the other bloggers seemed to really appreciate that I was able to drive some of the technical conversations. I’m sure some of my questions they would of rather not of answered since the answers weren’t always “yes we’ve been doing that forever..!”, but they were honest and up front about everything. When they could not be, they said so(“can’t talk about that here we need a Nate Disclosure Agreement”).

I haven’t dealt much at all with the other internal groups at HP, but I can say the folks I have dealt with on the storage side have all been AWESOME. Regardless of what I think about whatever storage product they are involved with they are all wonderful people both personally and professionally.

These past few posts have been entirely about what happened on Monday.  There are more bits that happened at the main conference on Tues-Thur, and once I get the slides for those slide decks I’ll be writing more about that, there were some pretty cool speakers. I normally steer far clear of such events, this one was pretty amazing though. I’ll save the details for the next posts.

I want to thank the team at HP, and Ivy Worldwide for organizing/sponsoring this event – it was a day of nothing but storage (and we literally ran out of time at the end, one or two topics had to be skipped). It was pretty cool. This is the first event I’ve ever traveled for, and the only event where there was some level of sponsorship (as mentioned HP covered travel, lodging and food costs).

July 31, 2013

HP Storage Tech Day – StoreAll, StoreVirtual, StoreOnce

Filed under: Storage — Tags: , , , , — Nate @ 5:31 pm

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

On Monday I attended a private little HP Storage Tech Day event here in Anaheim for a bunch of bloggers. They also streamed it live, and I believe the video is available for download as well.

I wrote a sizable piece on the 3PAR topics which were covered in the morning, here I will try to touch on the other HP Storage topics.

HP StoreAll + Express Query

StoreAll Platform

HP doesn’t seem to talk about this very much, and as time goes on I have understood why. It’s not a general purpose storage system, I suppose it never has been (though I expect Ibrix tried to make it one in their days of being independent). They aren’t going after NetApp or EMC’s enterprise NAS offerings. It’s not a platform you want to run VMs on top of. Not a platform you want to run databases on top of. It may not even be a platform you want to run HPC stuff on top of. It’s built for large scale bulk file and object storage.

They have tested scalability to 1,024 nodes and 16PB within a single name space. The system can scale higher, that’s just the tested configuration. They say it can scale non disruptively and re-distribute existing data across new resources as those resources are added to the system. StoreAll can go ultra dense with near line SAS drives going up to roughly 500 drives in a rack (without any NAS controllers).

It’s also available in a gateway version which can go in front of 3PAR, EVA and XP storage.

They say their main competition is EMC Isilon, which is another scale-out offering.

HP seems to have no interest in publishing performance metrics, including participating in SPECsfs (a benchmark that sorely lacks disclosures). The system has no SSD support at all.

The object store and file store, if I am remembering right, are not shared. So you have to access your data via a consistent means. You can’t have an application upload data to an object store then give a user the ability to browse to that file using CIFS or NFS. To me this would be an important function to serve if your object and file stores are in fact on the same platform.

By contrast, I would expect a scale out object store to do away with the concept of disk-based RAID and go with object level replication instead. Many large scale object stores do this already. I believe I even read in El Reg that HP Labs is working on something similar(nothing around that was mentioned at the event). In StoreAll’s case they are storing your objects in RAID, but denying you the flexibility to access them over file protocols which is unfortunate.

From a competitive standpoint, I am not sure what features HP may offer that are unique from an object storage perspective that would encourage a customer to adopt StoreAll for that purpose. If it were me I would probably take a good hard look at something like Red Hat Storage server(I would only consider RHSS for object storage, nothing else) or other object offerings if I was to build a large scale object store.

Express Query (below) cannot currently run on object stores at this time, it will with a future release though.

Express Query

This was announced a while back, which is what seems to be a SQL database of sorts that is running on the storage controllers, with some hooks into the file system itself. It provides indexes of common attributes as well as gives the user the ability to provide custom attributes to search by. As a result, obviously you don’t have to search the entire file system to find files that match these criteria.

It is exposed as a Restful API which has it’s ups and downs. As an application developer you can take this and tie it into your application. It returns results in JSON format (developer friendly, hostile to users such as myself – remember my motto “if it’s not friends with grep or sed, it’s not friends with me”).

The concept is good, perhaps the implementation could use some more work, as-is it seems very developer oriented. There is a java GUI app which you can run that can help you build and submit a query to the system which is alright. I would like to see a couple more things:

  • A website on the storage system (with some level of authentication – you may want to leave some file results open to the “public” if those are public shares) that allows users to quickly build a query using a somewhat familiar UI approach.
  • A drop in equivalent to the Linux command find. It would only support a subset of functionality but you could get a large portion of that functionality I believe fairly simply with this. The main point being don’t make the users have to make significant alterations to their processes to adopt this feature, it’s not complicated, lower the bar for adoption.

To HP’s credit they have written some sort of plugin to the Windows search application that gives windows users the ability to easily use Express Query(I did not see this in action). Nothing so transparent exists for Linux though.

My main questions though were things HP was unable to answer. I expected more from HP on this front in general. I mean specifically around competitive info. In many cases they seem to rely on the publicly available information on the competition’s platforms – maybe limited to the data that is available on the vendor website. HP is a massive organization with very deep pockets – you may know where I’m going here.  GO BUY THE PRODUCTS! Play with them, learn how they work, test them, benchmark them. Get some real world results out of the systems. You have the resources, you have the $$. I can understand back when 3PAR was a startup company they may not be able to go around buying arrays from the competition to put them through their paces. Big ‘ol HP should be doing that on a regular basis. Maybe they are — if they are — they don’t seem to be admitting that their data is based on that(in fact I’ve seen a few times where they explicitly say the information is only from data sheets etc).

Another approach might be, if HP lacks the man power to run such tests and stuff, to have partners or customers do it for them. Offer to subsidize the cost of some purchase by some customer of some competitive equipment in exchange for complete open access to competitive information as a result of using such a system. Or fully cover the cost.. HP has the money to make it happen. It would be really nice to see..

So in regards to Express Query I had two main questions about performance related to the competition. HP says they view Isilon as the main competition for StoreAll.  A couple of years back Isilon started offering a product(maybe it is standard across the board now I am not sure) where they stored the metadata in SSD. This would dramatically accelerate these sorts of operations, without forcing the user to change their way of doing things. Lowers the bar of adoption. Now price wise it probably costs more, StoreAll does not have any special SSD support whatsoever. But I would be curious as to the performance in general comparing Isilon’s accelerated metadata vs HP Express query.  Obviously Express Query is more flexible with it’s custom meta data fields and tagging etc, so for specific use cases there is no comparison. BUT.. for many things I think both would work well..

Along the same notes – back when I was a BlueArc customer one of their big claims was their FPGA accelerated file system had lightning fast meta data queries. So how does Express Query performance compare to something like that? I don’t know, and got no answer from HP.

Overall

Overall, I’d love it if HP had a more solid product in this area, it feels like whoever I talk to that they have just given up trying to be competitive with an in house enterprise file offering(they do have a file offering based on Windows storage server but I don’t really consider that in the same ballpark since they are just re-packaging someone else’s product). HP StoreAll has it’s use cases and it probably does those fairly well, but it’s not a general purpose file platform, and from the looks of things it’s never going to be one.

Software Defined Storage

Just hearing the phrase Software Defined <anything> makes me shudder. Our friends over at El Reg have started using the term hype gasm when referring to Software Defined Networking. I believe the SDS space is going to be even worse, at least for a while.

(On a side note there was an excellent presentation on SDN at the conference today which I will write about once I have the password to unlock the presentations so I can refresh my memory on what was on the slides – I may not get the password until Friday though)

As of today, the term is not defined at all. Everyone has their own take on it, and that pisses me off as a technical person. From a personal standpoint I am in the camp leaning more towards some separation of data and control planes ala SDN, but time will tell what it ends up being.

I think Software Defined Storage, if it is not some separation of control and data plan instead could just be a subsystem of sorts that provides storage services to anything that needs them. In my opinion it doesn’t matter if it’s from a hardware platform such as 3PAR, or a software platform such as a VSA. The point is you have a pool of storage which can be provisioned in a highly dynamic & highly available manor to whatever requests it. At some point you have to buy hardware – having infrastructure that is purpose built, and shared is obviously a commonly used strategy today. The level of automation and API type stuff varies widely of course.

The point here is I don’t believe the Software side of things means it has to be totally agnostic as to where it runs – it just needs a standard interfaces in which anything can address it(APIs, storage protocols etc). It’s certainly nice to have the ability to run such a storage system entirely as a VM, there are use cases for that for certain. But I don’t want to limit things to just that. So perhaps more focus on the experience the end user gets rather than how you get there. Something like that.

StoreVirtual

HP’s take on it is of course basically storage resources provisioned through VSAs. Key points being:

  • Software only (they also offer a hardware+software integrated package so…)
  • Hypervisor agnostic (right now that includes VMware and Hyper-V so not fully agnostic!)
  • Federated

I have been talking with HP off and on for months now about how confusing I find their messaging around this.

In one corner we have:

3PAR Eliminating boundaries.

3PAR Eliminating boundaries.

In the other corner we have

Software Defined Data Center - Storage

Software Defined Data Center – Storage (3PAR is implied to be Service Refined Storage – Storevirtual is Cost Optimized)

Store Virtual key features

Store Virtual key features

Mixed messages

(thinking from a less technical user’s perspective – specifically thinking back to some of my past managers who thought they knew what they were doing when they really didn’t – I’m looking out for other folks like me in the field who don’t want their bosses to get the wrong message when they see something like this)

What’s the difference between 3PAR’s SLA Optimized storage when value matters, and StoreVirtual Cost Optimized?

Hardware agnostic and federated sounds pretty cool, why do I need 3PAR when I can just scale out with StoreVirtual? Who needs fancy 3PAR Peer Persistence (fancy name for transparent full array high availability) when I have built in disaster recovery on the StoreVirtual platform?

Expensive 3PAR software licensing? StoreVirtual is all inclusive! The software is the same right? I can thin provision, I can snapshot, I can replicate, I can peer motion between StoreVirtual systems. I have disaster recovery, I have iSCSI, I have Fibre channel. I have scale out and I have a fancy shared-nothing design. I have Openstack integration. I have flash support, I have tiering, I have all of this with StoreVirtual. Why did HP buy 3PAR when they already have everything they need for the next generation of storage?

(stepping back into technical role now)

Don’t get me wrong – I do see some value in the StoreVirtual platform! It’s really flexible, easy to deploy, and can do wonders to lower costs in certain circumstances – especially branch office type stuff. If you can deploy 2-3 VM servers at an edge office and leverage HA shared storage without a dedicated array I think that is awesome.

But the message for data center and cloud deployments – do I run StoreVirtual as my primary platform or run 3PAR ?  The messaging is highly confusing.

My idea to HP on StoreVirtual vs. 3PAR

I went back and forth with HP on this and finally, out of nowhere I had a good idea which I gave to them and it sounds like they are going to do something with it.

So my idea was this – give the customer a set of questions, and based on the answers of those questions HP would know which storage system to recommend for that use case. Pretty simple idea. I don’t know why I didn’t come up with it earlier (perhaps because it’s not my job!). But it would go a long way in cleaning up that messaging around which platform to use. Perhaps HP could take the concept even further and update the marketing info to include such scenarios (I don’t know how that might be depicted, assuming it can be done so in a legible way).

When I gave that idea, several people in the room liked it immediately, so that felt good 🙂

 HP StoreOnce

(This segment of the market I am not well versed in at all, so my error rate is likely to go up by a few notches)

HP StoreOnce is of course their disk-based scale-out dedupe system developed by HP Labs. One fairly exciting offering in this area that was recently announced at HP Discover is the StoreOnce VSA. Really nice to see the ability to run StoreOnce as a VM image for small sites.

They spent a bunch of time talking about how flexible the VSA is, how you can vMotion it and Storage vMotion it like it was something special. It’s a VM, it’s obvious you should be able to do those things without a second thought.

StoreOnce is claimed to have a fairly unique capability of being able to replicate between systems without ever re-hydrating the data. They also claim a unique ability to be the first platform to offer real high availability. In a keynote session by David Scott (which I will cover in more depth in a future post once I get copies of those presentations) he mentioned that Data Domain as an example, if a DD controller fails during a backup or a restore operation the operation fails and must be restarted after the controller is repaired.

This surprised me – what’s the purpose of dual controllers if not to provide some level of high availability? Again forgive my ignorance on the topic as this is not an area of the storage industry I have spent much time at all in.

HP StoreOnce however can recover from this without significant downtime. I believe the backup or restore job still must be restarted from scratch, but you don’t have to wait for the controller to be repaired to continue with work.

HP has claimed for at least the past year to 18 months now that their performance far surpasses everyone else by a large margin, they continued those claims this week. I believe I read at one point from their competition that the claims were not honest in that I believe the performance claims was from a clustered StoreOnce system which has multiple de-dupe domains(meaning no global dedupe on the system as a whole), and it was more like testing multiple systems in parallel against a single data domain system(with global dedupe). I think there were some other caveats as well but I don’t recall specifics (this is from stuff I want to say I read 18-20 months ago).

In any case, the product offering seems pretty decent, is experiencing a good amount of growth and looks to be a solid offering in the space. Much more competitive in the space than StoreAll is, probably not quite as competitive as 3PAR, perhaps a fairly close 2nd as far as strength of product offering.

Next up, converged storage management and Openstack with HP. Both of these topics are very light relative to the rest of the material, I am going to go back to the show room floor to see if I can dig up more info.

 

July 30, 2013

HP Storage Tech Day – 3PAR

Filed under: Events,Storage — Tags: , , — Nate @ 2:04 am

Before I forget again..

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

So, HP hammered us with a good seven to eight hours of storage related stuff today, the bulk of the morning was devoted to 3PAR and the afternoon covered StoreVirtual, StoreOnce, StoreAll, converged management and some really technical bits from HP Labs.

This post is all about 3PAR. They covered other topics of course but this one took so long to write I had to call it a night, will touch on the other topics soon.

I won’t cover everything since I have covered a bunch of this in the past. I’ll try not to be too repetitive…

I tried to ask as many questions as I could, they answered most .. the rest I’ll likely get with another on site visit to 3PAR HQ after I sign another one of those Nate Disclosure Agreements (means I can’t tell you unless your name is Nate). I always feel guilty about asking questions directly to the big cheeses at 3PAR. I don’t want to take up any of their valuable time…

There wasn’t anything new announced today of course, so none of this information is new, though some of is new to this blog, anyway!

I suppose if there is one major take away for me for this SSD deep dive, is the continued insight into how complex storage really is, and how well 3PAR does at masking that complexity and extracting the most of everything out of the underlying hardware.

Back when I first started on 3PAR in late 2006, I really had no idea what real storage was. As far as I was concerned one dual controller system with 15K disks was the same as the next. Storage was never my focus in my early career (I did dabble in a tiny bit of EMC Clariion (CX6/700) operations work – though when I saw the spreadsheets and visios the main folks used to plan and manage I decided I didn’t want to get into storage), it was more servers, networking etc.

I learned a lot in the first few years of using 3PAR, and to a certain extent you could say I grew up on it. As far as I am concerned being able to wide stripe, or have mesh active controllers is all I’ve ever (really) known. Sure since then I have used a few other sorts of systems. When I see architectures and processes of doing things on other platforms I am often sort of dumbfounded why they do things that way. It’s sometimes not obvious to me that storage used to be really in the dark ages many years ago.

Case in point below, there’s a lot more to (efficient, reliable, scalable, predictable) SSDs than just tossing a bunch of SSDs into a system and throwing a workload at them..

I’ve never tried to proclaim I am a storage expert here(or anywhere) though I do feel I am pretty adept at 3PAR stuff at least, which wasn’t a half bad platform to land on early on in the grand scheme of things. I had no idea where it would take me over the years since. Anyway, enough about the past….

New to 3PAR

Still the focus of the majority of HP storage related action these days, they had a lot to talk about. All of this initial stuff isn’t there yet(up until the 7450 stuff below), just what they are planning for at some point in the future(no time frames on anything that I recall hearing).

Asynchronous Streaming Replication

Just a passive mention of this on a slide, nothing in depth to report about, but I believe the basic concept is instead of having asynchronous replication running on snapshots that kick off every few minutes (perhaps every five minutes) the replication process would run much more frequently (but not synchronous still), perhaps as frequent as every 30 seconds or something.

I’ve never used 3PAR replication myself. Never needed array based replication really. I have built my systems in ways that don’t require array based replication. In part because I believe it makes life easier(I don’t build them specifically to avoid array replication it’s merely a side effect), and of course the license costs associated with 3PAR replication are not trivial in many circumstances(especially if your only needing to replicate a small percentage of the data on the system). The main place where I could see leveraging array based replication is if I was replicating a large number of files, doing this at the block layer is often times far more efficient(and much faster) than trying to determine changed bits from a file system perspective.

I wrote/built a distributed file transfer architecture/system for another company a few years ago that involved many off the shelf components(highly customized) that was responsible for replicating several TB of data a day between WAN sites, it was an interesting project and proved to be far more reliable and scalable than I could of hoped for initially.

Increasing Maximum Limits

I think this is probably out of date, but it’s the most current info I could dig up on HP’s site. Though this dates back to 2010. These pending changes are all about massively increasing the various supported maximum limits of various things. They didn’t get into specifics. I think for most customers this won’t really matter since they don’t come close to the limits in any case(maybe someone from 3PAR will read this and send me more up to date info).

3PAR OS 2.3.1 supported limits

3PAR OS 2.3.1 supported limits(2010)

The PDF says updated May 2013, though the change log says last update is December. HP has put out a few revisions to the document(which is the Compatibility Matrix) which specifically address hardware/software compatibility, but the most recent Maximum Limits that I see are for what is now considered quite old – 2.3.1 release – this was before their migration to a 64-bit OS (3.1.1).

Compression / De-dupe

They didn’t talk about it, other than mention it on a slide, but this is the first time I’ve seen HP 3PAR publicly mention the terms. Specifically they mention in-line de-dupe for file and block, as well as compression support. Again, no details.

Personally I am far more interested in compression than I am de-dupe. De-dupe sounds great for very limited workloads like VDI(or backups, which StoreOnce has covered already). Compression sounds like a much more general benefit to improving utilization.

Myself I already get some level of “de duplication” by using snapshots. My main 3PAR array runs roughly 30 MySQL databases entirely from read-write snapshots, part of the reason for this is to reduce duplicate data, another part of the reason is to reduce the time it takes to produce that duplicate data for a database(fraction of a second as opposed to several hours to perform a full data copy).

File + Object services directly on 3PAR controllers

No details here other than just mentioning having native file/object services onto the existing block services. They did mention they believe this would fit well in the low end segment, they don’t believe it would work well at the high end since things can scale in different ways there. Obviously HP has file/object services in the IBRIX product (though HP did not get into specifics what technology would be used other than taking tech from several areas inside HP), and a 3PAR controller runs Linux after all, so it’s not too far fetched.

I recall several years ago back when Exanet went bust, I was trying to encourage 3PAR to buy their assets as I thought it would of been a good fit. Exanet folks mentioned to me that 3PAR engineering was very protective of their stuff and were very paranoid about running anything other than the core services on the controllers, it is sensitive real estate after all.  With more recent changes such as supporting the ability to run their reporting software(System Reporter) directly on the controller nodes I’m not sure if this is something engineering volunteered to do themselves or not. Both approaches have their strengths and weaknesses obviously.

Where are 3PAR’s SPC-2 results?

This is a question I asked them (again). 3PAR has never published SPC-2 results. They love to tout their SPC-1, but SPC-2 is not there……. I got a positive answer though: Stay tuned.  So I have to assume something is coming.. at some point. They aren’t outright disregarding the validity of the test.

In the past 3PAR systems have been somewhat bandwidth constrained due to their use of PCI-X. Though the latest generation of stuff (7xxx/10xxx) all leverage PCIe.

The 7450 tops out at 5.2 Gigabytes/second of throughput, a number which they say takes into account overhead of a distributed volume system (it otherwise might be advertised as 6.4 GB/sec as a 2-node system does 3.2GB/sec). Given they admit the overhead to a distributed system now, I wonder how, or if, that throws off their previous throughput metrics of their past arrays.

I have a slide here from a few years ago that shows a 8-controller T800 supporting up to 6.4GB/sec of throughput, and a T400 having 3.2GB/sec (both of these systems were released in Q3 of 2008). Obviously the newer 10400 and 10800 go higher(don’t recall off the top of my head how much higher).

This compares to published SPC-2 numbers from IBM XIV at more than 7GB/sec, as well as HP P9500/HDS VSP at just over 13GB/sec.

3PAR 7450

Announced more than a month ago now, the 7450 is of course the purpose built flash platform which is, at the moment all SSD.

Can it run with spinning rust?

One of the questions I had, was I noticed that the 7450 is currently only available in a SSD-only configuration. No spinning rust is supported. I asked why this was and the answer was pretty much what I expected. Basically they were getting a lot of flak for not having something that was purpose built. So at least in the short term, the decision not to support spinning rust is purely a marketing one. The hardware and software is the same(other than being more beefy in CPU & RAM) than the other 3PAR platforms. The software is identical as well. They just didn’t want to give people more excuses to label the 3PAR architecture as something that wasn’t fully flash ready.

It is unfortunate that the market has compelled HP to do this, as other workloads would still stand to gain a lot especially with the doubling up of data cache on the platform.

Still CPU constrained

One of the questions asked by someone was about whether or not the ASIC is the bottleneck in the 7450 I/O results. The answer was a resounding NO – the CPU is still the bottleneck even at max throughput. So I followed up with why did HP choose to go with 8 core CPUs instead of 10-core which Intel of course has had for some time. You know how I like more cores! The answer was two fold to this. The primary reason was cooling(the enclosure as is has two sockets, two ASICs, two PCIe slots, 24 SSDs, 64GB of cache and a pair of PSUs in 2U). The second answer was the system is technically Ivy-bridge capable but they didn’t want to wait around for those chips to launch before releasing the system.

They covered a bit about the competition being CPU limited as well especially with data services, and the amount of I/O per CPU cycle is much lower on competing systems vs 3PAR and the ASIC.  The argument is an interesting one though at the end of the day the easy way to address that problem is throw more CPUs at it, they are fairly cheap after all. The 7000-series is really dense so I can understand the lack of ability to support a pair of dual socket systems within a 2U enclosure along with everything else. The 10400/10800 are dual socket(though older generation of processors).

TANGENT TIME

I really have not much cared for Intel’s code names for their recent generation of chips. I don’t follow CPU stuff all that closely these days(haven’t for a while), but I have to say it’s mighty easy to confuse code name A from B, which is newer? I have to look it up. every. single. time.

I believe in the AMD world (AMD seems to have given up on the high end, sadly), while they have code names, they have numbers as well. I know 6200 is newer than 6100 ..6300 is newer than 6200..it’s pretty clear and obvious. I believe this goes back to Intel and them not being able to trademark the 486.

On the same note, I hate Intel continuing to re-use the code word i7 in laptops. I have an Core i7 laptop from 3 years ago, and guess what the top end today still seems to be? I think it’s i7 still. Confusing. again.
</ END TANGENT >

Effortless SSD management of each SSD with proactive alerts

I wanted to get this in before going deeper into the cache optimizations since that is a huge topic. But the basic gist of this is they have good monitoring of the wear of the SSDs in the platform(something I think that was available on Lefthand a year or two ago), in addition to that the service processor (dedicated on site appliance that monitors the array) will alert the customer when the SSD is 90% worn out. When the SSD gets to 95% then the system pro-actively fails the drive and migrates data off of it(I believe). They raised a statistic that was brought up at Discover that something along the lines of 95% of all deployed SSDs in 3PAR were still in the field – very few have worn out. I don’t recall anyone mentioning the # of SSDs that have been deployed on 3PAR but it’s not an insignificant number.

SSD Caching Improvements in 3PAR OS 3.1.2

There have been a number of non trivial caching optimizations in the 3PAR OS to maximize performance as well as life span of SSDs. Some of these optimizations also benefit spinning rust configurations as well – I have personally seen a noticeable drop in latency in back end disk response time since I upgraded to 3.1.2 back in May(it was originally released in December), along with I believe better response times under heavy load on the front end.

Bad version numbers

I really dislike 3PAR’s version numbering, they have their reasons for doing what they do, but I still think it is a really bad customer experience. For example going from 2.2.4 to 2.3.1 back in what was it 2009 or 2010. The version number implies minor update, but this was a MASSIVE upgrade.  Going from 2.3.x to 3.1.1 was a pretty major upgrade too (as the version implied).  3.1.1 to 3.1.2 was also a pretty major upgrade. On the same note the 3.1.2 MU2 (patch level!) upgrade that was released last month was also a major upgrade.

I’m hoping they can fix this in the future, I don’t think enough effort is made to communicate major vs minor releases. The version numbers too often imply minor upgrades when in fact they are major releases. For something as critical as a storage system I think this point is really important.

Adaptive Read Caching

Adaptive Read Caching

3PAR Adaptive Read Caching for SSD (the extra bits being read there from the back end are to support the T10 Data Integrity Feature- available standard on all Gen4 ASIC 3PAR systems, and a capability 3PAR believes is unique in the all flash space for them)

One of the things they covered with regards to caching with SSD is the read cache is really not as effective(vs with spinning rust), because the back end media is so fast, there is significantly less need for caching reads. So in general, significantly more cache is used with writes.

For spinning rust 3PAR reads a full 16kB of data from the back end disk regardless of the size of the read on the front end (e.g. 4kB). This is because the operation to go to disk is so expensive already and there is no added penalty to grab the other 12kB while your grabbing the 4kB you need. The next I/O request might request part of that 12kB and you can save yourself a second trip to the disk when doing this.

With flash things are different. Because the media is so fast, you are much more likely to become bandwidth constrained rather than IOPS constrained.  So if for example you have that 500,000 4k read IOPS on the front end, and your performing those same 16kB read IOPS on the back end, that is, well 4x more bandwidth that is required to perform those operations. Again because the flash is so fast, there is significantly less penalty to go back to SSD again and again to retrieve those smaller blocks. It also improves latency of the system.

So in short, read more from disks because you can and there is no penalty, read only what you need from SSDs because you should and there is (almost) no penalty.

Adaptive Write Caching

Adaptive Write Caching

Adaptive Write Caching

With writes the situation is similar to reads, to maximize SSD life span, and minimize latency you want to minimize the number of write operations to the SSD whenever possible.

With spinning rust again 3PAR works with 16kB pages, if a 4kB write comes in then the full 16kB is written to disk, again  because there is no additional penalty for writing the 16kB vs writing 4kB. Unlike SSDs your not likely bandwidth constrained when it comes to disks.

With SSDs, the optimizations they perform, again to maximize performance and reduce wear, is if a 4kB write comes in, a 16kB write occurs to the cache, but only the 4kB of changed data is committed to the back end.

If I recall right they mentioned this operation benefits RAID 1 (anything RAID 1 in 3PAR is RAID 10, same for RAID 5 – it’s RAID 50) significantly more than it benefits RAID 5/6, but it still benefits RAID 5/6.
 

Autonomic Cache offload

Autonomic Cache Offload

Autonomic Cache Offload

Here the system changes the frequency at which it flushes cache to back end media based on utilization. I think this plays a lot into the next optimization.

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Multi Tenant I/O Processing

3PAR has long been about multi tenancy of their systems. The architecture lends itself well to running in this mode though it wasn’t perfect, I believe for the most part the addition of Priority Optimization that was announced late last year and finally released last month fills the majority of the remainder of that hole. I have run “multi tenant” 3PAR systems since the beginning. Now to be totally honest the tenants were all me, just different competing workloads, whether it is disparate production workloads or a mixture of production and non production(and yes in all cases they ran on the same spindles). It wasn’t nearly as unpredictable as say a service provider with many clients running totally different things, that would sort of scare me on any platform. But there was still many times where rogue things (especially horrible SQL queries) overran the system (especially write cache). 3PAR handles it as well, if not better than anyone else but every system has it’s limits.

Front end operations

The caching flushing process to back end media is now multi threaded. This benefits both SSD as well as existing spinning rust configurations. Significantly less(no?) locking involved when flushing cache to disk.

Here is a graph from my main 3PAR array, you can see the obvious latency drop from the back end spindles once 3.1.2 was installed back in May (again the point of this change was not to impact back end disk latency as much as it was to improve front end latency, but there is a significant positive behavior change post upgrade):

Latency Change on back end spinning rust with 3.1.2
Latency Change on back end spinning rust with 3.1.2

There was a brief time when latency actually went UP on the back end disks. I was concerned at first but later determined this was the disk defragmentation processes running(again with improved algorithms), before the upgrade they took FAR too long, post upgrade they completed a big backlog in a few days and latency returned to low levels.

Multi Tenant Caching

Multi Tenant Caching

Back end operations

On the topic of multi tenant with SSDs an interesting point was raised which I had never heard of before. They even called it out as being a problem specific to SSDs, and does not exist with spinning rust. Basically the issue is if you have two workloads going to the same set of SSDs, one of them issuing large I/O requests(e.g. sequential workload), and the other issuing small I/O requests(e.g. 4kB random read), the smaller I/O requests will often get stuck behind the larger ones causing increases in latency to the app using smaller I/O requests.

To address this, the 128kB I/Os are divided up into four 32kB I/O requests and sent in parallel to the other workload. I suppose I can get clarification but I assume for a sequential read operation with 128kB I/O request there must not be any additional penalty for grabbing the 32kB, vs splitting it up even further into even more smaller I/Os.
 
 
 

Maintaining performance during media failures

3PAR has always done wide striping, and sub disk distributed RAID so the rebuild times are faster, the latency is lower and all around things run better(no idle hot spares) that way vs the legacy designs of the competition. The system again takes additional steps now to maximize SSD life span by optimizing the data reads and writes under a failure condition.

HP points out that SSDs are poor at large sequential writes, so as mentioned above they divide the 128kB writes that would be issued during a rebuild operation (since that is largely a sequential operation) into 32kB I/Os again to protect those smaller I/Os from getting stuck behind big I/Os.

They also mentioned that during one of the SPC-1 tests (not sure if it was 7400 or 7450) one of the SSDs failed and the system rebuilt itself. They said there was no significant performance hit(as one might expect given experience with the system) as the test ran. I’m sure there was SOME kind of hit especially if you drive the system to 100% of capacity and suffer a failure. But they were pleased with the results regardless. The competition would be lucky to have something similar.

What 3PAR is not doing

When it comes to SSDs and caching something 3PAR is not doing, is leveraging SSDs to optimize back end I/Os to other media as sequential operations. Some storage startups are doing this to gain further performance out of spinning rust while retaining high random performance using SSD. 3PAR doesn’t do this and I haven’t heard of any plans to go this route.

Conclusion

I continue to be quite excited about the future of 3PAR, even more so pre acquisition. HP has been able to execute wonderfully on the technology side of things. Sales from all accounts at least on the 7000 series are still quite brisk. Time will tell if things hold up after EVA is completely off the map, but I think they are doing many of the right things. I know even more of course but can’t talk about it here(yet)!!!

That’s it for tonight, at ~4,000 (that number keeps going up, I should goto bed) words this took three hours or more to write+proof read, it’s also past 2AM. There is more to cover, the 3PAR stuff was obviously what I was most interested in. I have a few notes from the other sessions but they will pale in comparison to this.

Today I had a pretty good idea on how HP could improve it’s messaging around whether to choose 3PAR or StoreVirtual for a particular workload. The messaging to-date to me has been very confusing and conflicting (HP tried to drive home a point about single platforms and reducing complexity, something this dual message seems to conflict with).  I have been communicating with HP off and on for the past few months, and today out of the blue I  came up with this idea which I think will help clear the air. I’ll touch on this soon when I cover the other areas that were talked about today.

Tomorrow seems to be a busy day, apparently we have front row seats, and the only folks with power feeds. I won’t be “live blogging”(as some folks tend to love to do), I’ll leave that to others. I work better at spending some time to gather thoughts and writing something significantly longer.

If you are new to this site you may want to check out a couple of these other articles I have written about 3PAR(among the dozens…)

Thanks for reading!

July 29, 2013

HP Storage Tech Day Live

Filed under: Storage — Tags: , , — Nate @ 12:38 am

In about seven and a half hours the HP tech day event will be starting, I thought it was going to be a private event but it looks like they will be broadcasting it via one of the folks here is adept at that sort of thing.

If your interested, the info is here. It starts at 8AM Pacific. Fortunately it’s literally downstairs from my hotel room.

Topics include

  • 3par deep dive (~3 hrs worth)
  • StoreVirtual  (Lefthand VSA), StoreOnce VSA, and Open stack integration
  • StoreAll (IBRIX), express query(Autonomy)

No Vertica stuff.. Vertica doesn’t get any storage people excited since it is so fast and reduces I/O by so much.. so you don’t need really fancy storage stuff to make it fly.

HP asked that I put this notice on these posts so the FCC doesn’t come after us..

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

(Many folks have accused me of being compensated by 3PAR and/or HP in the past based on what I have written here but I never have been – by no means do I love HP as a whole there are really only a few product lines that have me interested which is 3PAR, Proliant, and Vertica – ironically enough each of those came to HP via acquisition). I have some interest in StoreOnce though have yet to use it. (Rust in Peace WebOS — I think I will be on Android before the end of the year – more on that later..)

I’m pretty excited about tomorrow (well today given that it’s almost 1AM), though getting up so early is going to be a challenge!

Apparently I’m the only person in the group here that is not on twitter. I don’t see that changing anytime soon. Twitter and Facebook are like the latest generation of Star Trek movies, they basically don’t exist to me.

The one thing that I am sort of curious about is what, if any plans HP has for the P9500 range, they don’t talk about it much.. I’m sure they won’t come out and say they are retiring it any time soon, since it’s still fairly popular with select customers. I just want to try to get them to say something about it, I am curious.

(this is my first trip to a vendor-supported event that included travel)

July 24, 2013

Opscode is learning

Filed under: Random Thought — Tags: — Nate @ 10:11 am

A few months ago I wrote a long rant on how Opscode has a lot to learn about running operations.

Problems included:

  • status.opscode.com site returning broken HTTP Location headers which broke standards compliant browsers (I had reported this issue to them through multiple channels and support tickets for a good 7 months)
  • Taking scheduled downtime in the middle of a business day

It APPEARS someone read that post, because recently the status site was fixed. It now redirects to opscode.tumblr.com and since that time I have seen no issues with the site.

Also I see they have a scheduled downtime for some of their databases and they are scheduling it for 9PM Pacific time (Opscode is HQ’d in Pacific time), instead of say one in the afternoon. Obviously people in far time zones may not like that as much, but it makes sense to their U.S. customers(which I’d imagine is the bulk of their customer base but I don’t know).

They’ve also gone through some effort to post analysis on outages/performance issues recently as well which is nice.

I have two remaining requests, in case Opscode is reading:

  • Schedule downtime further in advance, the most recent announcement provides about 48 hours of notification, I think it’d be better to provide one week notice. Take a look at what other service providers do for planned outages, my experience says 48 hours is not sufficient notice for scheduled downtime. If it’s an emergency, then obviously a shorter window is acceptable just say it’s an emergency and try to explain why it’s an emergency.
  • Provide actual dates and times for the posts on the status site. Now it just says things like “17 hours ago” or “5 days ago”.
  • Be consistent on the time zone used. Some posts use UTC, others(scheduled events) refer to Pacific time. I don’t care which myself (well honestly I prefer Pacific since I am in that zone, but I can understand using UTC too).
  • Provide pro-active notification of any customer impacting maintenance.  Maybe all of their other customers follow them on twitter, I don’t know. I don’t use twitter myself. So having an email notification option (perhaps opt in by default) to customer addresses registered with the platform for such things would be good to consider.

Now as for Chef, there’s tons of things that could be improved with Chef to make it easier to use.. My latest issue is whenever I pull up the JSON to edit an environment, or a node or whatever the order is not consistent. My co-worker says the data is not ordered, and it has never been consistent for him, for me the issues just started a few weeks/month or two ago. It’s quite annoying. For example if I want to change the role of a node, I would knife node edit <hostname>, then skip to the end of the file, and change the role.  Now sometimes the role is at the top of the file, other times it is at the bottom (it’s never shown up as in the middle).

Pick a way to display the information and display it consistently! How hard is that to do.. It’s not as if I can pipe the JSON to the sort command and have it sort for me. I’ve never liked JSON for that reason — my saying is If it’s not friendly with grep and sed it’s not a friend to me. Or something like that ..  JSON seems to be almost exactly the opposite of what Linux admins want to deal with, it’s almost as bad as binary data, I just hate it. If I don’t have to deal with it (e.g. it’s used in applications and I never see it) – fine go nuts. Same goes for XML. I used to support a big application whose developers were gung ho for XML config files, we literally had several hundred. It would take WEEKS (literally) of configuration auditing(line by line) prior to deployment – and even then there was still problems. It was a massive headache. Using JSON just brings me back to those days.

The syntax is so delicate as well, one extra comma, or missing quote or anything the whole thing blows up(it wouldn’t be so bad if the tool ran a simple syntax check and pointed out what the error was and returned you to the editor to fix it telling you what line it was on, but in this case it just bombs and you lose all your changes — Opscode folks – look at visudo – it does this better..)

The only thing worse(off the top of my head) than the syntax for the chef recipes itself, is the JSON.. or maybe that should be vise versa..

Opscode and Chef are improving I guess is the point, maybe in the next decade or so it will become a product that I would consider usable to mere mortals.

Going to Nth Symposium 2013 – Anaheim, CA

Filed under: Events — Tags: — Nate @ 9:28 am

I have been MIA for a while sorry about that, I hope to write more soon, there are a few things I want to touch on, though have been fairly busy (and lazy I admit) and haven’t gotten round to writing.

I wanted to mention though I’ll be going to the Nth Symposium 2013 down in Anaheim CA next week. HP invited me and I was able to take them up on this one. This is the first time I’ve accepted anything like this where HP is covering travel and lodging costs for myself and a dozen or so other bloggers etc. They say I’m under no obligation to write anything, good or bad about anything I see. I imagine there will be at least a couple posts.. I’m way late writing about the new 3PAR flash stuff, I’ve been busy enough I haven’t gotten the briefing from them on the details. I’ll get that next week, and be able to ask any questions I may have on it. Oh, I have to remember HP gave me a special disclaimer to put on those blog posts to say how they paid my way to avoid FCC problems..

While the Nth event does have costs, they seem to waive them if your qualified (have to work in IT).

There is also an HP Tech Day on Monday I believe(private event?), I believe I was told that was a sort of mini Discover.

I’ve never been much for IT conferences(or any event with lots of people), though I hope this one will be better than past ones I have attended, since at least some of the topics are more my pace, and I’ll know a few people there.

I’ll be in Orange County all of next week(driving down on Sunday leaving Saturday or the following Sunday), HP is covering a few of the days that the conference is at, the rest is out of my pocket. I lived in OC for the latter half of the 90s, so I have some friends there, and the bulk of my immediate family resides there as well.

If your in the area and want to get some drinks drop me a line.. I don’t know what my schedule is yet, other than Thur/Fri night I am available for sure (Mon-Wed night may have some of the night taken up by HP I don’t know).

June 11, 2013

Pedal to the metal: HP 3PAR 7450

Filed under: Storage — Tags: — Nate @ 8:43 am

[NOTE: I expect to revise this many times – I’m not at HP Discover (maybe next year!), so I am basing this post off what info I have seen elsewhere, I haven’t yet got clarification on what NDA info specifically I can talk about yet so am trying to be cautious !]

[Update: HP’s website now has the info]

I was hoping they would announce the SPC-1 results of this new system, and I was going to wait until that happens, but I am not sure if they have them finalized yet, I’ve heard the ballpark figures, but am waiting for the official results.

The upside is I am on the east coast so I am up bright and early relative to my normal Pacific time zone morning.

I thought it would be announced later in the week but my first hint was this Russian blog (google translated), which I saw on LinkedIn a few minutes ago(relative to the time I started the blog post which took me a good two hours to write), also came across this press release of sorts, and there is the data sheet for the new system.

In a nut shell the 7450 is the system that HP mentioned at the launch event for the 7400 last December – though the model number was not revealed they said

In addition to mixed SSD/HDD and all-SSD configurations across the HP 3PAR StoreServ family, HP has announced the intent to develop an SSD-optimized hardware model based on the 3PAR operating system.

As fast as the all-SSD 7400 was, that was not the “optimized” hardware model – this one is (the one that was mentioned last December). I think the distinction with the word optimized vs using the phrase purpose built is important to keep in mind.

The changes from a hardware perspective are not revolutionary, 3PAR has, for the first time in their history (as far as I know anyway) has fairly quickly leveraged the x86 processors and upgraded both the processors and the memory (ASIC is the same as 7400) to provide the faster data ingest rate. I had previously (incorrectly of course) assumed that the ASIC was tapped out with earlier results and perhaps they would need even more ASICs to drive the I/O needs of an all-SSD system. The ASIC will be a bottleneck at some point but it doesn’t seem to be today – the bottleneck was the x86 CPUs.

They also beefed up the cache, doubling what the 7400 has.

  • 4-Node 7400: 4 x Intel Xeon 6-core 1.8 Ghz w/64GB Cache
  • 4-Node 7450: 4 x Intel Xeon 8-core 2.3Ghz w/128GB Cache

Would of been nice to have seen them use the 10-core chips, maybe the turnaround for such a change would of been too difficult to pull off in a short time frame. 8-core Intel is not bad though.

The Russian blog above touts a 55% increase in performance on the 7450 over the 7400, and the cost is about 6% more (the press release above quotes $99,000 as entry level pricing)

Throughput is touted as 5.5 Gigabytes/second, which won’t win any SPC-2 trophies, but is no slouch either – 3PAR has always been more about random IOPS than sequential throughput (though they often tout they can do both simultaneously within a single array – more effectively than other platforms).

The new system is currently tested (according to press release) at 540,000 read IOPS @ 0.6ms of latency. Obviously SPC-1 will be less than the 100% random read. This compares to the 7400 which was tested(under the same 100% read test I believe) to run at 320,000 IOPS @ 1.6ms of latency. So a 59.2% improvement in read IOPS and about 62% less latency.

Maybe we could extrapolate that number a bit here, the 7400 achieved 258,000 SPC-1 IOPS. 59.2% more would make the 7450 look like it would score around 413,000 SPC-1 IOPS, which is nearly the score of an 8-node P10800 which has 16 ASICs and 16 x Quad core Xeon processors! (that P10800 requires basically a full rack for just the controllers vs 4U for the 7450 (assuming they can get the full performance out of the controllers with only 48 SSD drives).

The blog also talks about the caching improvements targeted to improve performance and lifetime of the SSDs. The new 3PAR software also has a media wear gauge for the SSDs, something I believe the HP Lefthand stuff got in a year or two ago (better late than never!). The graphics the Russian blog has are quite good, I didn’t want to too shamelessly rip them from their blog to re-post here so I encourage you to go there to see the details on the caching improvements that are specific to SSD).

The competition

This system is meant to go head to head with the all-flash offerings from the likes of EMC, IBM NetApp (not aware of any optimized flash systems from HDS yet – maybe they will buy one of those new startups to fill that niche – they do have an optimized flash module for their VSP but I’d consider that a different class of product which may retain the IOPS constraints of the VSP platform).

However unlike the competition who has had to go outside of their core technology, HP 3PAR has been able to bring this all flash offering under the same architecture as the spinning rust models, basically it’s the same system with some tweaked software and faster processors with more memory. The underlying OS is the same, the features are the same, the administrative experience is the same. It’s the same, which is important to keep in mind. This is both good and bad, though for the moment I believe more good (Granted of course HP had to go to 3PAR to get all of this stuff, but as this blog has had a lot of 3PAR specific things I view this more in a 3PAR light than in a HP light if you get what I mean).

Of the four major competitors, EMC is the only one that touts deduplication (which, IMO is only really useful for things like VDI in transactional workloads)

3PAR is the only one with a mature enterprise/service provider grade operating system. On top of that obviously 3PAR is the only one that has a common platform amongst all of it’s systems from the 7200 all the way to the 10800.

3PAR and IBM are the only ones that are shipping now. Just confirmed from El Reg that the 7450 is available immediately.

None of big four tout compression, which I think would be a greater value add than deduplication for most workloads. I’m sure it’s on all of their minds though, it could be a non trivial performance hit, and in 3PAR’s case they’d likely need to implement it in the ASIC, if so, it means having to wait until the next iteration of the ASIC comes out. There has been gzip compression available in hardware form for many years so I imagine it wouldn’t be difficult to put into the silicon to keep the performance up under such conditions.

The new system also supports a 400GB MLC self encrypting drive (along with other SEDs for other 3PAR platforms as well) – 3PAR finally has a native encryption option, for those that need it.

Who should buy this

This isn’t an array for everyone (nor are the ones from the other big storage players). It’s a specialized system for specific very high performance workloads where latency is critical, yet at the same time providing the availability and manageability of the 3PAR platform to an all SSD solution.

You can probably go buy a server and stuff it with a few PCIe flash boards and meet or exceed the IOPS at a similar latency and maybe less price. If your workload is just dumb IOPS and you care about the most performance at the least price then there are other options available to you (they probably won’t work as well but you get what you (don’t) pay for).

There clearly is a market for such a product though, the first hint of this was dropped when HP announced an all flash version of it’s P10000 about a year ago. Customers really wanted an all flash system and they really wanted the 3PAR OS on it. If your not familiar with the high end 3PAR systems well from a form factor perspective driving 400k+ SPC-1 IOPS on a P10800 vs a 7450 you would probably get a good chuckle out of how much floor space and power circuits are required for the P10800 (power draw would be light on the SSDs of course, but they have hard requirements for power provisioning – most customers would pay per circuit regardless of draw).

I think a lot of this may be in the banking sector, where folks are happy to buy tons of fancy low latency stuff to make sure their transactions are processed in milliseconds.

Fifteen milliseconds may not seem like a significant amount of time—it is literally shorter than a human blink of an eye, which takes 300 to 400 milliseconds. But in the age of super-high-speed computerized trading, Wall Street firms need less than a millisecond to execute a trade.

[..]

All told, Nanex calculated that $28 million worth of shares were exchanged in a short time[15 milliseconds] before the official release of the ISM data.

The skeptics

There have been a lot of skeptics out there wondering whether or not the 3PAR architecture could be extended to cover an all flash offering (you can actually sort of count me in the skeptical camp as well, I was not sure even after they tried to re-assure me, I want to see the numbers at the end of the day). I believe with this announcement they have shown that even more so than the 7400, they have a very solid all flash offering that will, in most cases beat the tar out of the competition, not only on performance, not only on latency, not only on enterprise grade availability and functionality, but on price as well.

Even with this high performance system, these all SSD systems illustrate quite well how a modern storage controller is not able to scale anywhere nearly as well with SSDs as with spinning rust. Most of the SSD offerings have a small number of SSDs before they tap out the controllers. No single controller(that I’ve seen) supports the multi millions of IOPS that would be required to drive many hundreds of SSDs at line rate simultaneously(like regular storage arrays would drive hundreds of disks today).

It is just interesting to me to see the massive bottleneck shift continues to be the controller, and will be for some time to come. I wonder when the processors will get fast enough that they might shift the bottleneck back to the storage media, a decade? Or perhaps by that time everyone will be running on some sort of mature grid storage technology, and the notion of controllers as most of us know them today will be obsolete as a concept. Certainly several cloud providers are already trying to provide grid storage as an alternative, though in most cases, while the cost can be low, the performance is very poor as well (relative to an HP 3PAR anyway).

There is always more work to do (in this case mainly dedupe and compression), and as you might expect HP, along with the other big storage companies are constantly working to add more, I am very excited about what the future holds for 3PAR, really have never been so excited since the launch of the 7000 series last year(as a customer now for almost seven years) and am very pleased with what HP has managed to accomplish with the technology thus far.

Other 3PAR announcements

  • 3PAR Priority Optimization is made available now (first announced last December) – this is basically fine grained QoS for IOPS and throughput, something that will be a welcome enhancement to those running true multi tenant systems.
  • 3PAR Recovery Manager for Hyper-V – sounds like they are bringing Hyper-V up to the same level of support as VMware.
  • As mentioned earlier, Self encrypting drive options are cited on the Russian blog include – 400GB MLC SSD, 450GB 10k, 900GB 10k, 1TB 7.2k 2.5 “

Side note: there are a few other things to write about later, such as the IBM XIV SPC-1, the HP StoreOnce VSA, and probably whatever else comes out at Discover. For sure I won’t get to those today(or maybe even tomorrow, I am on a semi vacation/working week this week).

June 5, 2013

Real life Sim City

Filed under: Random Thought — Tags: — Nate @ 7:30 am

[WARNING: Non technical content directly ahead]

I’ve looked at Google maps a lot over the years, but don’t remember ever seeing something quite like this. I played tons of Sim City many years ago and when I saw this I was immediately reminded of Sim City. It just seems so ..familiar.

I’m planning on staying at a hotel in this town in Nevada in a couple of weeks to visit a friend who is coming in from out of town(in case you were wondering how I stumbled upon this).

This first picture reminds me of many times when I would build out a neighborhood in Sim City with the roads, zone it with light (or medium) residential, perhaps  put neighborhood school near by – then watch the houses pop up one by one:

Real life Sim City Part 1

You can see a few individual houses here and there, and it’s pretty easy to make out what look a lot like Sim City zoned plots of land(semi square shaped), obviously with a bunch of roads that are already complete. For the most part very clean empty plots of land. Much different than what I have seen many times in the past where perhaps there is a big real estate project under development and many houses are being built simultaneously with the road being laid out.

There is another part of the town that is quite similar, again eerily reminds me of Sim City:

Sim City in real life part 2

In this case I’m again reminded of some low density residential, along with a park in the middle(well in this case the other half of the middle is not yet laid down (not in the picture above, see the google maps link). The plots are so uniform, the houses remind me so much of Sim City.

June 4, 2013

Break out the bubbly, 100k SPAM comments

Filed under: Random Thought — Nate @ 10:49 am

It seems we crossed the 100k spam comments blocked by Akismet mark. (see right side of the page)

Saved for history: crossing the 100k comment spam marker

That is just insane. 100k. I don’t even know what I would do without them. Well I guess I do — I’d have to keep comments off. That low cost annual subscription pays for itself pretty quick.

I verified on archive.org that on October 10, 2012 this site was at ~32k spam comments blocked. On May 17, 2012 only ~23k.

75,000 spam comments in about one year? For this tiny site ?

*shakes head*

Side note: for some reason HP employees are always blocked by Akismet, I don’t know why. I think they are the only ones who have contacted me saying their comments were (incorrectly) blocked.

 

« Newer PostsOlder Posts »

Powered by WordPress