IBM posts XIV SPC-2 results
[UPDATED - as usual I re-read my posts probably 30 times after I post them and refine them a bit if needed, this one got quite a few changes. I don't run a newspaper here so I don't aim to have a completely thought out article when I hit post for the first time]
IBM finally came out and posted some SPC-2 results for their XIV platform, which is better than nothing but unfortunately they did not post SPC-1 results.
SPC-2 is a sequential throughput test, more geared towards things like streaming media and data warehousing instead of random I/O which represents a more typical workload.
The numbers are certainly very impressive though, coming in at 7.3 gigabytes/second, besting most other systems out there, 42 megabytes/second per disk, IBM's earlier high end storage array was only able to inch out 12 megabytes/second per disk(with 4 times the number of disks) with disks that were twice as fast. So at least 8 times the I/O capacity, for only about 25% more performance vs XIV, that's a stark contrast!
SATA/Nearline/7200RPM SAS disks are typically viewed as good at sequential operations, though I would expect 15k RPM disks to do at least as well, since the faster RPM should result in more data traveling under the head at a faster rate, perhaps a sign of a good architecture in XIV with it's distributed mirrored RAID.
While the results are quite good - again it doesn't represent the most common types of workloads out there which is random I/O.
The $1.1M discounted price of the system seems quite high for something that only has 180 disks on it(discounts on the system seem to for the most part be 70%), though there is more than 300 gigabytes of cache. I bought a 2-node 3PAR T400 with 200 SATA disks shortly after the T was released in 2008 for significantly less, of course it only had 24GB of data cache!
I hope the $300 modem IBM(after 70% discount) is using is a USR Courier! (Your Price: $264.99 - still leaves a good profit for IBM). Such fond memories of the Courier.
I can only assume at this point of time IBM has refrained from posting SPC-1 results is because with a SATA-only system the results would not be impressive. In a fantasy world with nearline disks and a massive 300GB cache maybe they could achieve 200-250 IOPS/disk which would put the $1.1M system with 180 disks 36,000 - 45,000 SPC-1 IOPS, or $24-30/IOP.
A more realistic number is probably going to be 25,000 or less($44/IOP), making it one of the most expensive systems out there for I/O (even if it could score 45,000 SPC-1). 3PAR would do 14,000 IOPS (not SPC-1 IOPS mind you, SPC-1 number would probably be lower) with 180 SATA disks and RAID 10 by contrast, based on their I/O calculator with 80% read/20% write workload for about 50% less cost(after discounts) for a 4-node F400.
One of the weak spots on 3PAR is the addressable capacity per controller pair, for I/O and disk connectivity purposes a 2-node F200 (much cheaper) could easily handle 180 2TB SATA disks, but from a software perspective that is not the case. I have been complaining about this for more than 3 years now, they've finally addressed it to some extent in the V-class but I am still disappointed to the extent it has been addressed per the supported limits(1.6PB, should be more than double that) that exist today, but at least with the V they have enough memory on the box to scale it up with software upgrades(time will tell if such upgrades come about however).
I would not even use a F400 for this if it was me opting instead for a T800 (800TB) or a V class(800-1600TB), because with 360TB raw on the system that is very close to the limit of the F400's addressable capacity (384TB), or the T400(400TB). You could of course get a 4-node T800(or a 2-node V400 or V800) to start, then add additional controllers to get beyond 400TB of capacity if/when the need arises. With the 4-controller design you also get the wonderful persistent cache feature built in (one of the rare software features that is not separately licensed).
But for this case, comparing a nearly maxed out F400 against a maxed out XIV is still fair - it is one of the main reasons I did not consider XIV during my last couple storage purchases.
So there is a strong use case of when to use XIV with these results - throughput oriented workloads! The XIV would absolutely destroy the F400 in throughput, which tops out at 2.6GB/sec (to disk).
With software such as Vertica out there which slashes the need for disk I/O on data warehouses given it's advanced design, and systems such as Isilon being so geared towards things like scale out media serving (using NFS for media serving seems like a more ideal protocol anyways), I can't help but wonder what XIV's place is in the market, at this price point at least. It does seem like a very nice platform from a software perspective, and with their recent switch to Infiniband from 1 Gigabit ethernet a good part of their hardware has been improved as well, also it has SSD read cache coming.
I will say though that this XIV system will handily beat even a high end 3PAR T800 for throughput. While 3PAR has never released SPC-2 numbers the T800 tops out at a 6.4 gigabytes/second(from disk), and it's quite likely it's SPC-2 results would be lower than that.
With the 3PAR architecture being as optimized as it is for random I/O I do believe it would suffer vs other platforms with sequential I/O. Not that the 3PAR would run slow, but it would quite likely run slower due to how data is distributed on the system. That is just speculation though a result of not having real numbers to base it on. My own production random I/O workloads in the past have had 15k RPM disks running in the range of 3-4MB/second(numbers are extrapolated as I have only had SATA and 10k RPM disks in my 3PAR arrays to-date though my new one that is coming is 15k RPM), as such with a random I/O workload you can scale up pretty high before you run into any throughput limits on the system (in fact if you max out a T800 with 1,280 drives you could do as high as 5MB/second/disk before you would hit the limit). Though XIV is distributed RAID too so who knows..
Likewise I suspect 3PAR/HP have not released SPC-2 numbers because it would not reflect their system in the most positive light, unlike SPC-1.
Sorry for the tangents on 3PAR
HP Storage strategy – some hits, some misses
[UPDATED - Some minor updates since my original post] I was at an Executive Briefing by HP, given by HP's head of storage, and former 3PAR CEO David Scott. I suppose this is one good reason to be in the Bay Area - events like this didn't really happen in Seattle.
I really didn't know what to expect, I was looking forward to seeing David speak as I had not heard him before, his accent, oddly enough surprised me.
He covered a lot of various topics, it was clear of course he was more passionate about 3PAR than Lefthand, or Ibrix or XP or EVA, not surprising.
The meeting was not technical enough to get any of my previously mentioned questions answered, seemed very geared towards the PhB crowd.
HP Storage hits
3PAR
One word: Duh. The crown jewel of the HP storage strategy.
He emphasized over and over the 3PAR architecture and how it's the platform powering 7 of the top 10 clouds out there, the design that lets them handle unpredictable workloads, you know this stuff by now, I don't need to repeat it.
David pointed out an interesting tidbit with regards to their latest SPC-1 announcement, he compared the I/O performance and cost per usable TB of the V800 to a Texas Memory Systems all-flash array that was tested earlier this year.
The V800 outperformed the TMS system by 50,000 IOPS, and came in at a cost per usable TB of only about $13,000 vs $50,000/TB for the TMS box.
Cost per I/O, which he did not mention certainly favored the TMS system($1.05), but the comparison was still a good one I thought - we can give you the performance of flash and still give you a metric ton of disk space at the same time. Well if you want to get technical I guesstimate the fully loaded V800 weighs in at 13,160 pounds or about 6 metric tons.
Of course flash certainly has it's use cases, if you don't have a lot of data it doesn't make sense to invest in 2,000 spinning rust buckets to get to 450,000 IOPS.
Peer Motion
Peer motion - both a hit and a miss, a hit because it's a really neat technology, the ability to non disruptively migrate between storage systems without 3rd party appliances, the miss, well I'll talk about that below.
He compared peer motion to the likes of Hitachi's USP/VSP, IBM's SVC, and EMC's VPLEX, which are all expensive, complicated bolt-on solutions. Seemed reasonable.
Lefthand VSA
It's a good concept, and it's nice to see it as a supported platform. David mentioned that Vmware themselves originally tried to acquire Lefthand (or wanted to acquire I don't know if anything official was made) because they saw the value in the technology - and of course recently Vmware introduced something kinda-sorta-similar to the Lefthand VSA in vSphere 5. Though it seems not quite as flexible or as scalable.
I'm not sure I see much value in the P4000 appliances by contrast, I hear that doing RAID 5 or worse yet RAID 6 on P4000 is just asking for a whole lotta pain.
StoreOnce De-duplication
It sounds like it has a lot of promise, I'll put it in the hit column for now as it's still a young technology and it'll take time to see where it goes. But the basic concept is a single de-duplication technology for all of your data. He contrasted this with EMC's strategy for example where they have client side de-dupe with their software backup products, in line de-dupe with data domain, and primary storage dedupe -- none of which are compatible with each other. Who knows, by the time HP gets it right with StoreOnce maybe EMC and others will get it right too.
I'm still not sold myself on the advantages of dedupe outside of things like backups, and things like VDI. I've sat through what seems like a dozen NetApp presentations on the topic so I have had the marketing shoved down my neck many times. I came to this realization a few years ago during an eval test of some data domain gear, I'll be honest and admit I did not fully comprehend the technicals behind de-duplication at the time and I expected pretty good results from feeding it tens of gigabytes of uncompressed text data. But turns out I was wrong and realized why I was under an incorrect assumption to begin with.
Now data compression on the other hand is another animal entirely, being able to support in line data compression without suffering much or any I/O hit really would be nice to see (I am aware there are one/more vendors out there that offer this technology now).
HP Storage Misses
Nobody is perfect, HP and 3PAR are no exception no matter how much I may sing praises for them here.
Peer Motion
When I first heard about this technology being available on both the P4000 and 3PAR platforms I assumed that it was compatible with each other, meaning you could peer motion data to/from P4000 and 3PAR. One of my friends at 3PAR clarified this was not the case with me a few weeks ago and David Scott mentioned that again today.
He tried to justify it comparing it to vSphere vMotion where you can't do a vMotion between a vSphere system and a Hyper-V system. He could of gone deeper and said you can't do vMotion even between vSphere hosts if the CPUs are not compatible, would of been a better example.
So he said that most federation technologies are usually homogeneous in nature, and you should not expect to be able to peer motion from a HP P4000 to a HP 3PAR system.
Where HP's argument kind of falls apart here is that the bolt on solutions he referred to as inferior previously do have the ability to migrate data between systems that are not the same. It may be ugly, it may be kludgey, but it can work. Hitachi even lists 3PAR F400, S400 and T800 as supported platforms behind the USP. IBM lists 3PAR and HP storage behind their SVC.
So, what I want from HP is the ability to do peer motion between at least all of their higher end storage platforms (I can understand if they never have peer motion on the P2000/MSA since it's just a low end box). I'm not willing to accept any excuses, other than "sorry, we can't do it because it's too complicated". Don't tell me I shouldn't expect to have it, I fully expect to have it.
Just another random thought but when I think of storage federation, and homogeneous I can't help but think of this scene from Star trek VI
GORKON: I offer a toast. ...The undiscovered country, ...the future.
ALL: The undiscovered country.
SPOCK: Hamlet, act three, scene one.
GORKON: You have not experienced Shakespeare until you have read him in the original Klingon.
CHANG: (in Klingonese) 'To be or not to be.'
KERLA: Captain Kirk, I thought Romulan ale was illegal.
KIRK: One of the advantages of being a thousand light years from Federation headquarters.
McCOY: To you, Chancellor Gorkon, one of the architects of our future.
ALL: Chancellor!
SCOTT: Perhaps we are looking at something of that future here.
CHANG: Tell me, Captain Kirk, would you be willing to give up Starfleet?
SPOCK: I believe the Captain feels that Starfleet's mission has always been one of peace.
CHANG: Ah.
KIRK: Far be it for me to dispute my first officer. Starfleet has always been...
CHANG: Come now, Captain, there's no need to mince words. In space, all warriors are cold warriors.
UHURA: Er. General, are you fond of ...Shakes ....peare?
CHEKOV: We do believe all planets have a sovereign claim to inalienable human rights.
AZETBUR: Inalien... If only you could hear yourselves? 'Human rights.' Why the very name is racist. The Federation is no more than a 'homo sapiens' only club.
CHANG: Present company excepted, of course.
KERLA: In any case, we know where this is leading. The annihilation of our culture.
McCOY: That's not true!
KERLA: No!
McCOY: No!
CHANG: 'To be, or not to be!', that is the question which preoccupies our people, Captain Kirk. ...We need breathing room.
KIRK: Earth, Hitler, nineteen thirty-eight.
CHANG: I beg your pardon?
GORKON: Well, ...I see we have a long way to go.
For the most basic workloads it's not such a big deal if you have vSphere and storage vMotion (or some other similar technology). You cannot fully compare storage vMotion with peer motion but for offering the basic ability to move data live between different storage platforms it does (mostly) work.
HP Scale-out NAS (X9000)
I want this to be successful, I really do. Because I like to use 3PAR disks and well there just aren't many NAS options out there these days that are compatible. I'm not a big fan of NetApp, I very reluctantly bought a V3160 cluster to try to replace an Exanet cluster on my last 3PAR box because well Exanet kicked the bucket(not the product we had installed but the company itself). I left the company not long after that, and barely a year later the company is already going to abandon NetApp and go with the X9000 (of all things!). Meanwhile their unsupported Exanet cluster keeps chugging along.
Back to X9000. It sounds like a halfway decent product they say the main thing they lacked was snapshot support and that is there now(or will be soon), kind of strange Ibrix has been around for how long and they did not have file system snapshots till now? I really have not heard much about Ibrix from anyone other than HP whom obviously sings the praises for the product.
I am still somewhat bitter for 3PAR not buying Exanet when they had the chance, Exanet is a far better technology than Ibrix. Exanet was sold for, if I remember right $12 million, a drop in the bucket. Exanet had deals on the table(at the time) that would of brought in more than $12 million in revenue (in each deal) alone. Multi petabyte deals. Here is the Exanet Architecture (and file system), as it stood in 2005, in my opinion, very similar to the 3PAR approach(completely distributed, clustered design - Exanet treats files like 3PAR treats chunklets), except Exanet did not have any special ASICs, everything was done in software. Exanet had more work to do on their product it was far from perfect but it had a pretty solid base to build upon.
So, given that I do hope X9000 does well, I mean my needs are not that great,. what I'd really like to see is a low end VSA for the X9000 along the lines of their P4000 iSCSI VSA. Just for simple file storage in an HA fashion. I don't need to push 30 gigabits/second, just want something that is HA, has decent performance and is easy to manage.
Legacy storage systems (EVA especially)
Let it die already, HP has been adamant they will continue to support and develop the EVA platform for their legacy customers. That sort of boggles my mind. Why waste money on that dead end platform. Use the money to give discounted incentives to upgrade to 3PAR when the time comes. I can understand supporting existing installs, bug fix upgrades, but don't waste money on bringing out whole new revisions of hardware and software to this dead end product. David said so himself - supporting the install base of EVA and XP is supporting their 11% market share, the other 89% of the market that they don't have they are going to push 3PAR/Lefthand/Ibrix.
I would find a way to craft a message to that 11% install base, subliminal messaging (ala Max Headroom, not sure why that came to my head) make them want to upgrade to a 3PAR box, not the next EVA system.
XP/P9500 I can kinda sorta see keeping around, I mean there are some things it is good at that even 3PAR can't do today. But the market for such things is really tiny, and shrinking all the time. Maybe HP doesn't put much effort into this platform because it is OEM'd from Hitachi, in which case it doesn't cost a lot to re-sell, in which case it doesn't make a big difference if they keep selling it or stopped selling it. I don't know.
I can just see what goes through a typical 3PAR SE's mind (at least those that were present before HP acquired 3PAR) when they are faced with selling an EVA. If the deal closes perhaps they scream NOooooooooooooooooooo like Darth Vader in Return of the Jedi. Sure they'd rather have the customer buy HP then go buy a Clariion or something. But throw these guys a bone. Kill EVA and use the money to discount 3PAR more in the marketplace.
P2000/MSA - gotta keep that stuff, there will probably always be some market for DAS
Insights
I had the opportunity to ask some high level questions of David and got some interesting responses, he seems like a cool guy
3PAR Competition
David harped a lot on how other storage architectures from the big manufacturers were designed 15-20 years ago. I asked him - why does he think - given 3PAR technology is 10+ years old at this point that these other manufacturers haven't copied it to some degree? It has obvious technological advantages it just baffles me why others haven't copied it.
His answer came down to a couple of things. The main point was 3PAR was basically just lucky. They were in the right place, at the right time, with the right product. They successfully navigated the tech recession when at least two other utility storage startups could not and folded (I forgot their names, I'm terrible with names). He said the big companies pulled back on R&D spending as a result of the recession and as such didn't innovate as much in this area, which left a window of opportunity for 3PAR.
He also mentioned two other companies that were founded at about the same time to address the same market - utility computing. He mentioned Vmware as one of them, the other was the inventor of the first blade system, forgot the name. Vmware I think I have to dispute though. I seem to recall Vmware "stumbling" into the server market on accident rather than targeting it directly. I mean I remember using Vmware before it was even Vmware workstation or GSX. It was just a program used to run another OS on top of Linux (that was the only OS Vmware ran on at the time). I recall reading that the whole server virtualization movement came way later and caught Vmware off guard. as much as it caught everyone else off guard.
He also gave an example in EMC and their VMAX product line. He said that EMC mis understood what the 3PAR architecture was about - in that they thought it was just a basic cluster design, so EMC re-worked their system to be a cluster - the result is VMAX. But it still falls short in several design aspects, EMC wasn't paying attention.
I was kind of underwhelmed when the VMAX was announced, I mean sure it is big, and bad, and expensive, but they didn't seem to do anything really revolutionary in it. Same goes for the Hitachi VSP. I fully expected both to do at least some sort of sub disk distributed RAID. But they didn't.
Utilizing flash as a real time cache
David harped a lot on 3PAR's ability to be able to respond to unpredictable workloads. This is true, I've seen it time and time again, it's one reason why I really don't want to use any other storage platform at this point in time given the opportunity.
Something I thought really was innovative that came out of EMC in the past year or two is their Flash Cache product (I think that's the right name), the ability to use high speed flash as both a read and a write cache. The ability to bulk the cache levels up into the multiples of terabytes for big cloud operations.
His response was - we already do that - with RAM cache. I clarified a bit more in saying scaling out the cache even more with flash well beyond what you can do with RAM. He kind of ducked the question saying it was a bit too technical/architectural for the crowd in the room. 3PAR needs to have this technology. My key point to him is the 3PAR tools like Adaptive Optimization and Dynamic Optimization are great tools - but they are not real time. I want something that is real time. It seemed he acknowledged that point - the lack of the real time nature of the existing technologies as a weak point - hopefully HP/3PAR addresses it soon in some form.
In my previous post, Gabriel commented on how the new next gen IBM XIV will be able to have up to 7.5TB of read cache via SSD. I know NetApp can have a couple TB worth of read cache in their higher end boxes. As far as I know only EMC has the technology to do both read and write. I can't say how well it works, since I've never used it and know nobody that has this EMC gear, but it is a good technology to have, especially as flash matures more.
I just think how neat it would be to have, say a 1.5-2PB system running SATA disks with an extra 100TB(2.5-5% of total storage) of flash cache on top of it.
Bringing storage intelligence to the application layer
Another question I asked him was his thoughts around a broader industry trend which seems to be trying to bring the intelligence of storage out of the storage system and put it higher up in the stack - given the advanced functionality of a 3PAR system are they threatened at all by this? The examples I gave were Exchange and Oracle ASM.
He focused on Oracle, mentioning that Oracle was one of the original investors in 3PAR and as a result there was a lot of close collaboration between the two companies, including the development of the ASM technology itself.
He mentioned one of the VPs of Oracle, I forget if he was a key ASM architect or developer or something, but someone high up in the storage strategy involving ASM -- in the early days this guy was very gung ho, absolutely convinced that running the world on DAS with ASM was the way to go. Don't waste your money on enterprise storage, we can do it higher in the stack and you can use the cheap storage, save yourself a lot of money.
David said once this Oracle guy saw 3PAR storage powering an Oracle system he changed his mind, he no longer believed that DAS was the future.
The key point David was trying to make was - bringing storage intelligence higher up in the stack is OK if your underlying storage sucks. But if you have a good storage system, you can't really match that functionality/performance/etc that high up in the stack and it's not worth considering.
Whether he is right or not is another question, for me I think it depends on the situation, but any chance I get I will of course lean towards 3PAR for my back end disk needs rather than use DAS.
In short - he does not feel threatened at all by this "trend". Though if HP is unwilling or unable to get peer motion working between their products when things like Storage vMotion and Oracle ASM can do this higher up in the stack, there certainly is a case for storage intelligence at the application layer.
Best of Breed
David also seemed to harp a lot about best of breed. He knocked his competitors for having a mis mash of technologies, specifically he mentioned market leading technologies instead of best of breed. Early in his presentation he touted HP's market leading position in servers, and their #2 position in networking (you could say that is market leading).
He also tried to justify that the HP integrated cloud stack is comprised of best of breed technologies, it just happens to be two out of the three are considered market leading, no coincidence there.
Best of breed is really a perception issue when you get down to it. Where do you assign value in the technology. Do you just want the cheapest you can get? Do you want the most advanced software? Do you want the fastest system? Do you want the most reliable? Ease of use? interoperable ? flexibility? buzz word compliant? Big name brand?
Because of that, many believe these vertically integrated stacks won't go very far. There will be some customer wins of course, but those will more often then not be wins based on technology but based on other factors, political (most likely), financial (buy from us and we'll finance it all no questions asked), or maybe just the warm and fuzzy feeling incompetent CIOs get when they buy from a big name that says they will stand behind the products.
I did ask David what is HP's stance on a more open design for this "cloud" thing. Not building a cloud based on a vertically integrated stack. His response was sort of what I expected - none of the other stack vendors are open, we aren't either so we don't view it as an important point.
I was kind of sad that he never used the 3cV term, really, I think was likely the first stack that was out there, and it wasn't even official, there was no big marketing or sales push behind it.
For me, my best of breed storage is 3PAR, it may have 1 or 2% market share (more now), so it surely is not market leading(might make it there with HP behind it), but for my needs it's best of breed.
Switching, likewise, Extreme - maybe 1 - 1.5% market share, not market leading either, but for me, best of breed.
Fibre Channel - I like Qlogic. Probably not best of breed, certainly not market leading at least for switches, but damn easy to use and it gets the job done for me. Ironically enough while digging up links for this post I came across this, which is an article suggesting Qlogic should buy Extreme, back in 2009. I somewhat fear, the most likely company to buy Extreme at this point is Oracle. I hope Oracle does not buy them, but Oracle is trying to play the whole stack game too and they don't really have any in house networking, unlike the other players. Maybe Oracle will jump on someone like Arista instead, be a cheaper price, ala Pillar.
Servers - I do like HP best of course for the enterprise space - they don't compete as well in scale out though.
Vmware on the other hand happens to be in a somewhat unique position being both the market leader and for the most part best of breed, though others are rapidly closing the gap in the breed area, Vmware had many years with no competition.
Summary
All in all it was pretty good, a lot more formal than I was expecting, I saw 3 people I knew from 3PAR, I sort of expected to see more (I fully was expecting to see Marc Farley there! where were you Marc!).
David did harp on quite a bit using Intel processors, something that Chuck from EMC likes to harp on too. I did not ask this question of David, because I think I know the answer. The question would be does he think HP will migrate to Intel CPUs, and away from their purpose built ASIC? I think the answer to that question is no, at least not in the next few years(barring some revolution in general purpose processors). With 3PAR's distributed design I'm just not sure how a general purpose CPU could handle calculating the parity for what could be as many as say half a million RAID arrays on a storage system like the V800 without the assistance of an ASIC or FPGA. I really do not like HP's pushing of the Intel brand so much being a partner of HP and all, at least with regards to 3PAR. Because really - the Intel CPU does not do much at all in the 3PAR boxes, it never has.
Just look at IBM XIV - they do distributed RAID, though they do mirroring only, and they top out at 180 disks, even with 60 2.4Ghz Intel CPU cores (120 threads), with a combined 180MB of CPU cache. Mirroring is a fairly simple task vs doing parity calculations.
Frankly, I'd rather see an AMD processor in there, especially one with 12-16 cores. The chipsets that power the higher end Intel processors are fairly costly, vs AMD's chipset scales down to 1 CPU socket without an issue. I look at a storage system that has dual or quad core CPUs in it and I think what a waste. Things may be different if the storage manufacturers included the latest & greatest Intel 8 and 10 core processors but thus-far I have not seen anything close to that.
David also mentioned a road VMware is traveling, moving away from file systems to support VMs to a 1:1 relationship between LUNs and VMs, making life a lot more granular. He postulated (there's a word I've never used before) that this technology(forgot the name) will make protocols like NFS obsolete(at least when it comes to hosting VMs), which was a really interesting concept to me.
At the end of the day, for the types of storage systems I have managed myself, for the types of companies I have worked for I don't get enough bang for the buck out of many of the more advanced software technologies on these storage systems. Whether it is simple replication stuff, or space reclamation, I'm still not sold on Adaptive optimization or other automatic storage tiering techniques, even application aware snapshots. Sure these are all nice features to have, but they are low on my priority list, I have other things I want to buy first before I invest in these things, myself at least. If money is no object - sure, load up on everything you got! I feel the same way about VMware and their software value add strategy. Give me the basics and let me go from there. Basics being a solid underlying system that is high performance, predictable and easy to manage.
There was a lot of talk about cloud and their integrated stacks and stuff like that but that was about as interesting to me as sitting through a NetApp presentation. At least with most of the NetApp presentations I sat through I got some fancy steak to go with it, just some snacks at this HP event.
One more question I have for 3PAR - what the hell is your service processor running that requires 317W of power! Is it using Intel technology circa 2004 ?
This actually ended up being a lot longer than I had originally anticipated, nearly 4200 words!
Linear scalability
So 3PAR released their SPC-1 results for their Mac daddy P10000, and the results aren't as high as I originally guessed they might be.
HP claims it is a world record result for a single system. I haven't had the time yet to try to verify but they are probably right.
I'm going to a big HP/3PAR event later today and will ask my main question - was the performance constrained by the controllers or by the disks? I'm thinking disks, given the IOPS/disk numbers below.
Here's some of the results
| System | Date Tested | SPC-1 IOPS | IOPS per Disk | SPC-1 Cost per IOP | SPC-1 Cost per usable TB |
|---|---|---|---|---|---|
| 3PAR V800 | 10/17/2011 | 450,212 | 234 | $6.59 | $12,900 |
| 3PAR F400 | 4/27/2009 | 93,050 | 242 | $5.89 | $20,308 |
| 3PAR T800 | 9/2/2008 | 224,989 | 175 | $9.30 | $26,885 |
The cost per TB number was slashed because they are using disks that are much larger (300GB vs 147GB on earlier tests).
The cost was pretty reasonable as well coming in at under $7/IOP which is actually less than their previous results on their T800 from 2008 which was already cheap at $9.30/IOP.
It is interesting that they used Windows to run the test, which is a first for them I believe, having used real Unix in the past (AIX and Solaris for T800 and F400 respectively).
The one kind of strange thing, which is typical in 3PAR SPC-1 numbers is the sheer number of volumes they used (almost 700). I'm not sure what the advantage would be to doing that, another question I will try to seek the answer to.
The system was, as expected, remarkably easy to configure, the entire storage configuration process consisted of this
for n in {0..7}
do
for s in 1 4 7
do
if(($s==1))
then
for p in 4
do
controlport offline -f $n:$s:$p
controlport config host -ct point -f $n:$s:$p
controlport rst -f $n:$s:$p
done
fi
for p in 2
do
controlport offline -f $n:$s:$p
controlport config host -ct point -f $n:$s:$p
controlport rst -f $n:$s:$p
done
done
done
PORTS[0]=":7:2"
PORTS[1]=":1:2"
PORTS[2]=":1:4"
PORTS[3]=":4:2"
for nd in {0..7}
do
createcpg -t r1 -rs 120 -sdgs 120g -p -nd $nd cpgfc$nd
for hba in {0..3}
do
for i in {0..14} ; do
id=$((1+60*nd+15*hba+i))
createvv -i $id cpgfc${nd} asu1.${id} 240g;
createvlun -f asu1.${id} $((15*nd+i+1)) ${nd}${PORTS[$hba]}
done
for i in {0..1} ; do
id=$((681+8*nd+2*hba+i))
j=$((id-680))
createvv -i $id cpgfc${nd} asu3.${j} 360g;
createvlun -f asu3.${j} $((2*nd+i+181)) ${nd}${PORTS[$hba]}
done
for i in {0..3} ; do
id=$((481+16*nd+4*hba+i))
j=$((id-480))
createvv -i $id cpgfc${nd} asu2.${j} 840g;
createvlun -f asu2.${j} $((4*nd+i+121)) ${nd}${PORTS[$hba]}
done
done
done
Think about that, a $3 million storage system(after discount) configured in less than 50 lines of script?
Not a typical way to configure a system, I had to look at it a couple of times but it seems they are still pinning volumes to particular controller pairs, and LUNs to particular FC ports. This is what they have done in the past so it's nothing new but I would like to see how the system runs without such pinning of resources and let the inter-node routing do it's magic, since that is how the customers would run the system.
But that's what full disclosure is all about right! Another reason I like the SPC-1, is the in depth configuration information that you don't need an NDA to see(and in 3PAR's case you probably don't need to attend a 3-week training course to understand!)
I'm trying to think of one but I can't think of another storage architecture out there that scales as well as the 3PAR Inspire architecture from the low end(F200) to the high end(V800).
The cost of the V800 was a lot more reasonable than I was fearing it might be, it's only roughly 45% more expensive than the T800 tested in September 2008, for that extra 45% you get 50% more disks, double the I/O capacity, almost three times the usable capacity. Oh, and five times more data cache, and 8 times more control cache to boot!
I'm suspecting the ASICs are not being pushed to their limits here in the V800, and that the system can go quite a bit faster provided there is not a I/O bottleneck on the disks behind the controllers.
On the backs of these numbers The Register is reporting HP is beefing up the 3PAR sales team after having experienced massive growth over the past year, seems to be at least roughly 300% increase in sales over the past year, so much that they are having a hard time keeping up with demand.
I haven't been happy with the hefty price increases HP has put into the 3PAR product line though in a lot of cases those come back out in the form of discounts. I guess it's what the market will bear right - as long as things are selling as fast as they can make them HP doesn't have any need to reduce the price.
I saw an interview with the chairman of HP a few weeks ago, when they announced their new CEO. He mentioned how 3PAR had exceeded their sales expectations significantly for justification for paying that lofty price to acquire them about a year ago.
So congrats to 3PAR, I knew you could do it!
Fusion IO enhances MySQL performance further
This seems pretty neat. Not long ago Fusion IO announced their first new real product refresh in quite a while which offers significantly enhanced performance.
Today I see another related article that goes into something more specific, from Data Center Knowledge -
Fusion-io also announced a new extension to its VSL (Virtual Storage Layer) software subsystem for conducting Atomic Writes in the popular MySQL open source database. Atomic Writes are an operation in which a processor can simultaneously write multiple independent storage sectors as a single storage transaction. This accelerates mySQL and gives new features powered by the flexibility of sophisticated flash architectures. With the new Atomic Writes extension, Fusion-io testing has observed 35 percent more transactions per second and a 2.5x improvement in performance predictability compared to conducting the same MySQL tests without the Atomic Writes feature.
I know that Facebook is a massive user of Fusion IO for their MySQL database farms, I suspect this feature was made for them! Though it can benefit everyone.
My only question would be can this Atomic write capability be used by MySQL when running through the ESX storage layer, or does there need to be more native access from the OS.
About the new product lines, from The Register -
The ioDrive 2 comes in SLC form with capacities of 400GB and 600GB. It can deliver 450,000 write IOPS working with 512-byte data blocks and 350,000 read IOPS. These are whopping great increases, 3.3 times faster for the write IOPS number, over the original ioDrive SLC model which did 135,000 write IOPS and 140,000 read IOPS. It delivered sequential data at 750-770MB/sec whereas the next-gen product does it at 1.5GB/sec, around two times faster.
[..]
All the products will ship in November. Prices start from $5,950
The cards aren't available yet, wonder how accurate those numbers will end up being? But in any case, even if they were over inflated by a large amount that's still an amazing amount of I/O.
On a related note I was just browsing the Fusion IO blog which mentions this MySQL functionality as well and saw that Fusion IO was/is showing off a beefy 8-way HP DL980 with 14 HP-branded IO accelerators at Oracle Openworld -
We're running Oracle Enterprise Edition database version 11g Release 2 on a single eight processor HP ProLiant DL980 G7 system integrated with 14 Fusion ioMemory-based HP IO Accelerators, achieving performance of more than 600,000 IOPS with over 6GB/s bandwidth using a real world, read/write mixed workload.
[..]
the HP Data Accelerator Solution for Oracle is configured with up to 12TB of high performance flash[..]
After reading that I could not help but think how HP's own Vertica, with it's extremely optimized encoding and compression scheme would run on such a beast. I mean if you can get 10x compression out of the system(Vertica's best-case real world is 30:1 for reference), get a pair of these boxes (Vertica would mirror between the two) and you have upwards of 240TB of data to play with.
I say 240TB because of the way Vertica mirrors the data it allows you to store it in a different sort order on the mirror allowing for even faster access if your querying the data in different ways. Who knows - with the compression you may be able to get much better than 10:1 depending on your data.
Vertica is so fast that you will probably end up CPU bound more than anything else - 80 cores per server is quite a lot though! The DL980 supports up to 16 PCI Express slots so even with 14 cards that still leaves room for a couple 10GigE ports and/or Fibre channel or some other form of connectivity other than what's on the motherboard (which seems to have an optional dual port 10GbE NIC)
With Vertica's licensing (last I checked) starting in the 10s of thousands of dollars per raw TB (before compression), it falls into the category for me to blow a ton of money on hardware to make it run the best it possibly can (same goes for Oracle - though Standard Edition to a lesser degree). Vertica is coming out with a Community Edition soon which I believe is free, I don't recall what the restrictions are I think one of them was it was limited to a single server, I don't recall yet hearing on what the storage limits might be(I'd assume there would be some limit maybe half a TB or something)
HDS aborbs Bluearc
It seems HDS has finally decided to buy out BlueArc after what was either two or three failed attempts at an IPO.
BlueArc, along with my buddies over at 3PAR is among the few storage companies that puts real silicon to work in their system for the best possible performance. Their architecture is quite impressive and the performance (that is for their mid range system) shows.
I have only been exposed to their older stuff (5-6 year old technology) directly, not their newer technology. But even their older stuff was very fast and efficient, very reliable and had quite a few nifty features as well. I think they were among the first to do storage tiering (for them at the file level).
[ warning - a wild tangent gets thrown in here somewhere ]
While their NAS technology was solid(IMO), their disk technology was not. They relied on LSI storage, and the quality of the storage was very poor over all. First off whoever setup the system we had set it up with everything running RAID 5 12+1, then there was the long RAID rebuild times, the constant moving hot spots because of the number of different tiers of storage we had, the fact that the 3 Titan head units were not clustered so we had to take hard downtime for software upgrades(not BlueArc's fault other than perhaps making it too expensive to be able to do clustered heads when the company bought the stuff - long before I was there). Every time we engaged with BlueArc 95% of our complaints were about the disk. For the longest time they tried to insist that "disk doesn't matter". That you could put any storage system behind the BlueArc and it would be the same.
After the 3rd or 4th engagement BlueArc significantly changed their tune (not sure what prompted it), but now acknowledged the weakness of the low tier storage and was promoting the use of HDS AMS storage (USP was well, waaaaaaaay out of our price range) since they were a partner of HDS back then as well. The HDS proposal fell far short of the design I had with 3PAR and at the time Exanet was their partner of choice.
If I could of chosen I would of used BlueArc for NAS and 3PAR for disk. 3PAR was open to the prospect of course, BlueArc claimed they had contacted 3PAR to start working with them but 3PAR said that never happened. Later BlueArc acknowledged they were not going to try to work with 3PAR (or any other storage company other than LSI or HDS - I think 3PAR was one digit too long for them to handle).
Given the BlueArc system lacked the ability to provide us with any truly useful disk performance statistics, it was tough coming up with a configuration that I thought would work as a replacement. There was a large number of factors involved, and any one of them had a fairly wide margin of error. You could say I pulled a number out of my ass, but I did do more calculations than that I have about a dozen pages of documentation I wrote at the time on the project, but really at the end of the day it was a stab in the dark as far as initial configuration.
BlueArc as a company, at the time didn't really have their support stuff all figured out yet. The first sign was when we had scheduled downtime for a software upgrade that was intended to take 2-3 hours ended up taking 10-11 hours because there was a problem and BlueArc lacked the proper escalation procedures to resolve it quick enough. Their CEO sent us a letter later saying that they fixed that process in the company. The second sign was when I went to them and asked them to confirm the drive type/size of all of our disks so I could do some math for the replacement system. They did a new audit(had to be on site to do it for some reason), and turns out we had about 80 more spindles than they thought we had(we bought everything through them). I don't know how you lose track of that amount of disks for support but somehow it fell through the cracks. Another issue we had was we paid BlueArc to relocate the system to another facility(again before I was at the company), and whomever moved it didn't do a good job, they accidentally plugged both power supplies of a single shelf into the same PDU. Fortunately it was a non production system. A PSU blew at one point that took out the PDU, which then took out that shelf which then took out the file system the shelf was on.
Even after all of that my main problem with their solution was the disks. LSI was not up to snuff and the proposal from HDS wasn't going to cut it. I told my management that there is no doubt that HDS could come up with a solution that would work -- it's just what they have proposed will not(they didn't even have thin provisioning at the time. 3PAR was telling me HDS was pairing USP-Vs along with AMSs in order to try to compete in the meantime. They did not propose that to us). A combination of poor performing SATA on RAID-6 no less for bulk storage and higher performing 15k RPM disks for higher tier/less storage. HDS/BlueArc felt it was equivalent to what I had specified through 3PAR and Exanet, not understanding the architectural advantages the 3PAR system had over the proposed HDS design(going into specifics will take too long you probably know them by now anyways if your here). Not to mention what seemed like sheer incompetence among the HDS team that was supporting us, it seemed nothing I asked them they could answer without engaging someone from Japan and even then I rarely got a comprehensible answer.
So in the end we ended up replacing a 4-rack BlueArc system with what could of been a single rack 3PAR + a few rack units for the Exanet but we had to put the 3PAR in two racks due to weight constraints in the data center. We went from 500+ disks (mix of SATA-I and 10k RPM FC) to 200 disks of SATA-II (RAID 5 5+1). With the change we got the advantage of being able to run fibre channel (which we ran to all of our VM boxes as well as primary databases), iSCSI (which we used here and there 3PAR's iSCSI support has never been as good as I would of liked to have seen it though for anything serious I'd rather use FC anyways and that's what 3PAR's customers did which led to some neglect on the iSCSI front).
Half the floor space, half the power usage, roughly the same amount of usable storage, about the same amount of raw storage. We put the 3PAR/Exanet system through it's paces with our most I/O intensive workload at the time and it absolutely screamed. I mean it exceeded everyone's expectations(mine included). But that was only the begining.
This is a story I like to tell on 3PAR reference calls when I do them which is becoming more and more rare these days. In the early days of our 3PAR/Exanet deployment the Exanet engineer tried to assure me that they were thin provisioning friendly, he had personally used 3PAR+Exanet in the past and it worked fine. So with time constraints and stuff I provisioned a file system on the Exanet box not thinking too much about the 3PAR end of things. It's thin provisioning friendly right? RIGHT?
Well not so much, before you knew it the system was in production and we started dumping large amounts of data on it, and deleting large amounts of data on it, I found out in a few weeks the Exanet box was preferring to allocate new space rather than reclaim deleted space. I did some calculations and the result was not good. If we let the system continue at this rate we were going to exceed the amount of capacity on the 3PAR box if the Exanet file system was allowed to grow to it's full potential. Not good.. Compound that with the fact that we were at the maximum addressable capacity of a 2-node 3PAR box, if I had to add even 1 more disk to the system(not that adding 1 disk is possible in that system due to the way disks are added, minimum is 4), I would of had to put in 2 more controllers. Which as you might expect is not exactly cheap. So I was looking at what could of been either a very costly downtime to do data migration or a very costly upgrade to correct my mistake.
Dynamic optimization to the rescue. This really saved my ass. I mean really, it did. When I built the system I used RAID 5 3+1 for performance (for 3PAR that is roughly ~8% slower than RAID 10, and 3PAR fast RAID 5 is probably on par with many other vendors RAID 10 due to the architecture).
So I ran some more calculations and determined if I could get to RAID 5 5+1 I would have enough space to survive. So I began the process, converting roughly a half dozen LUNs at a time. 24 hours a day, 7 days a week. It took longer than I expected, the 3PAR was routinely getting hammered from daily activities from all sides. It took about 5 months in the end to convert all of the volumes. Throughout the process nobody noticed a thing. The array was converting volumes for 24 hours a day for 5 months straight and nobody noticed (except me who was baby sitting it hoping I could beat the window). If I recall right I probably had 3-4 weeks of buffer space, if my conversions was going to take an extra month I would of exceeded the capacity of the system. So, I got lucky I suppose, but also bought the system knowing I could make such course corrections online without impacting applications for just that kind of event -- I just didn't expect the event to be so soon and on such a large scale.
One of the questions I had for HDS at the time we were looking at them was could they do the same online RAID conversions. The answer? Absolutely they can. But the fine print was (assuming it still is) you needed blank disks to do the migration to. Since 3PAR's RAID is performed at the sub disk level no blank disks are required, only blank "chunklets" as they call them. Basically you just need enough empty space on the array to mirror the LUN/Volume to the new RAID level and then you break the mirror and eliminate the source (this is all handled transparently with a single command and some patience depending on system load and volume of data in flight).
As time went on we loaded the system with ever more processes and connected ever more systems to it as we got off the old BlueArc(s). I kept seeing the IOPS (and disk response times) on the 3PAR going up..and up.. I really thought it was going to choke, I mean we were pushing the disks hard, with sustained disk response times in the 40-50ms range at times(with rare spikes to well over 100ms). I just kept hoping for the day when we would be done and the increase would level off, and it did for the most part eventually. I built my own custom monitoring system for the array for performance trending, since I didn't like the query based tool they provided as much as what I could generate myself(despite the massive amount of time it took to configure my tool).
I did not know a 7200RPM SATA disk could do 127 IOPS of random I/O.
We had this one process that would dump up to say 50GB of data from upwards of 40-50 systems simultaneously as fast as they could go. Needless to say when this happened it blew out all of the caches across the board and brought things to a grinding halt for some time(typically 30-60 seconds). I would see the behavior on the NAS system and login to my monitoring tool and just see it hang while it tried to query the database(which was on the storage). I would cringe, and wait for the system to catch up. We tried to get them to re-design the application so it was more thoughtful of the storage but they weren't able to. Well they did re-design it one time (for the worse). I tried to convince them to put it on fusion IO on local storage in the servers but they would have no part of it. Ironically not long after I left the company they went out and bought some Fusion IO for another project. I guess as long as the idea was not mine it was a good one.. The storage system was entirely a back office thing, no real time end user transactions ever touched it, which meant we could live with the higher latency by pushing the SATA drives 30-50% beyond engineering specifications.
At the end of the first full year of operation we finally got budget to add capacity to the system, we had shrunk the overall theoretical I/O capacity by probably 2/3rds vs the previous array, and had absorbed almost what seemed like a 200% growth on top of that during the first year and the system held up. I probably wouldn't of believed it if I didn't see it(and live it) personally. I hammered 3PAR as often as I could to increase the addressable capacity of their systems which was limited by the operating system architecture. Doesn't take a rocket scientist to see that their systems had 4GB of control cache(per controller) which is a common limit to 32-bit software. But the software enhancement never came while I was there at least, it is there in some respect in the new V-class, though as mentioned the V-class seems to have had an arbitrary raw capacity limit placed on it that does not align with the amount of control cache it can have (up to 32GB per controller). With 64-bit software and more control cache I could of doubled or tripled the capacity of the system without adding controllers.
Adding the two extra controllers did give us one thing I wanted - Persistent cache, that's just an awesome technology to have and you simply can't do that kind of thing on a 2-controller system. Also gave us more ports than I knew what to do with.
What happened to the BlueArc? Well after about 10 months of trying to find someone to sell it to - or give it to -- we ended up paying someone to haul it away. When HDS/BlueArc was negotiating with us on their solution they tried to harp on how we could leverage our existing disk from BlueArc in the new solution as another tier. I didn't have to say it my boss did which made me sort of giggle - he said the operational costs of running the old BlueArc disk (support was really high, + power and co-lo space) was more than the disks were worth, BlueArc/HDS didn't have any real response to that. Other than perhaps to nod their heads acknowledging that we're smart enough to realize that fact.
I still would like to use BlueArc again, I think it's a fine platform, I just want to use my own storage on it
This ended up being a lot longer than I expected! Hope you didn't fall asleep. Just got right to 2600 words.. there.
Farewell Terremark – back to co-lo
I mentioned not long ago that I was going co-lo once again. I was co-lo for a while for my own personal services but then my server started to act up (the server was 6 years old if it was still alive today) with disk "failure" after failure (or at least that's what the 3ware card was predicting eventually it stopped complaining and the disk never died again). So I thought - do I spent a few grand to buy a new box or go "cloud". I knew up front cloud would cost more in the long run but I ended up going cloud anyways as a stop gap - I picked Terremark because it had the highest quality design at the time(still does).
During my time with Terremark I never had any availability issues, there was one day where there was some high latency on their 3PAR arrays though they found & fixed whatever it was pretty quick (didn't impact me all that much).
I had one main complaint with regards to billing - they charge $0.01 per hour for each open TCP or UDP port on their system, and they have no way of doing 1:1 NAT. For a web server or something this is no big deal, but for me I needed a half dozen or more ports open per system(mail, dns, vpn, ssh etc) after cutting down on ports I might not need, so it starts to add up, indeed about 65% of my monthly bill ended up being these open TCP and UDP ports.
Once both of my systems were fully spun up (the 2nd system only recently got fully spun up as I was too lazy to move it off of co-lo) my bill was around $250/mo. My previous co-lo was around $100/mo and I think I had them throttle me to 1Mbit of traffic (this blog was never hosted at that co-lo).
The one limitation I ran into on their system was that they could not assign more than 1 IP address for outbound NAT per account. In order to run SMTP I needed each of my servers to have their own unique outbound IP. So I had to make a 2nd account to run the 2nd server. Not a big deal(for me, ended up being a pain for them since their system wasn't setup to handle such a situation), since I only ran 2 servers (and the communications between them were minimal).
As I've mentioned before, the only part of the service that was truly "bill for what you use" was bandwidth usage, and for that I was charged between 10-30 cents/month for my main system and 10 cents/month for my 2nd system.
Oh - and they were more than willing to setup reverse DNS for me which was nice (and required for running a mail server IMO). I had to agree to a lengthy little contract that said I wouldn't spam in order for them to open up port 25. Not a big deal. The IP addresses were "clean" as well, no worries about black listing.
Another nice thing to have if they would of offered it is billing based on resource pools, as usual they charge for what you provision(per VM) instead of what you use. When I talked to them about their enterprise cloud offering they charged for the resource pool (unlimited VMs in a given amount of CPU/memory), but this is not available on their vCloud Express platform.
It was great to be able to VPN to their systems to use the remote console (after I spent an hour or two determining the VPN was not going to work in Linux despite my best efforts to extract linux versions of the vmware console plugin and try to use it). Mount an ISO over the VPN and install the OS. That's how it should be. I didn't need the functionality but I don't doubt I would of been able to run my own DHCP/PXE server there as well if I wanted to install additional systems in a more traditional way. Each user gets their own VLAN, and is protected by a Cisco firewall, and load balanced by a Citrix load balancer.
A couple of months ago the thought came up again of off site backups. I don't really have much "critical" data but I felt I wanted to just back it all up, because it would be a big pain if I had to reconstruct all of my media files for example. I have about 1.7TB of data at the moment.
So I looked at various cloud systems including Terremark but it was clear pretty quick no cloud company was going to be able to offer this service in a cost effective way so I decided to go co-lo again. Rackspace was a good example they have a handy little calculator on their site. This time around I went and bought a new, more capable server.
So I went to a company I used to buy a ton of equipment from in the bay area and they hooked me up with not only a server with ESXi pre-installed on it but co-location services (with "unlimited" bandwidth), and on-site support for a good price. The on-site support is mainly because I'm using their co-location services(which in itself is a co-lo inside Hurricane Electric) and their techs visit the site frequently as-is.
My server is a single socket quad core processor, 4x2TB SAS disks (~3.6TB usable which also matches my usable disk space at home which is nice - SAS because VMware doesn't support VMFS on SATA though technically you can do it the price premium for SAS wasn't nearly as high as I was expecting), 3ware RAID controller with battery backed write-back cache, a little USB thing for ESXi(rather have ESXi on the HDD but 3ware is not supported for booting ESXi), 8GB Registered ECC ram and redundant power supplies. Also has decent remote management with a web UI, remote KVM access, remote media etc. For co-location I asked (and received) 5 static IPs (3 IPs for VMs, 1 IP for ESX management, 1 IP for out of band management).
My bandwidth needs are really tiny, typically 1GB/month. Though now with off site backups that may go up a bit (in bursts). Only real drawback to my system is the SAS card does not have full integration with vSphere so I have to use a cli tool to check the RAID status, at some point I'll need to hook up nagios again and run a monitor to check on the RAID status. Normally I setup the 3Ware tools to email me when bad things happen, pretty simple, but not possible when running vSphere.
The amount of storage on this box I expect to last me a good 3-5 years. The 1.7TB includes every bit of data that I still have going back a decade or more - I'm sure there's a couple hundred gigs at least I could outright delete because I may never need it again. But right now I'm not hurting for space so I keep it there, on line and accessible.
My current setup
- One ESX virtual switch on the internet that has two systems on it - a bridging OpenBSD firewall, and a Xangati system sniffing packets(still playing with Xangati). No IP addresses are used here.
- One ESX virtual switch for one internal network, the bridging firewall has another interface here, and my main two internet facing servers have interfaces here, my firewall has another interface here as well for management. Only public IPs are used here.
- One ESX virtual switch for another internal network for things that will never have public IP addresses associated with them, I run NAT on the firewall(on it's 3rd/4th interfaces) for these systems to get internet access.
I have a site to site OpenVPN connection between my OpenBSD firewall at home and my OpenBSD firewall on the ESX system, which gives me the ability to directly access the back end, non routable network on the other end.
Normally I wouldn't deploy an independent firewall, but I did in this case because, well I can. I do like OpenBSD's pf more than iptables(which I hate), and it gives me a chance to play around more with pf, and gives me more freedom on the linux end to fire up services on ports that I don't want exposed and not have to worry about individually firewalling them off, so it allows me to be more lazy in the long run.
I bought the server before I moved, once I got to the bay area I went and picked it up and kept it over a weekend to copy my main data set to it then took it back and they hooked it up again and I switched my systems over to it.
The server was about $2900 w/1 year of support, and co-location is about $100/mo. So disk space alone the first year(taking into account cost of the server) my cost is about $0.09 per GB per month (3.6TB), with subsequent years being $0.033 per GB per month (took a swag at the support cost for the 2nd year so that is included). That doesn't even take into account the virtual machines themselves and the cost savings there over any cloud. And I'm giving the cloud the benefit of the doubt by not even looking at the cost of bandwidth for them just the cost of capacity. If I was using the cloud I probably wouldn't allocate all 3.6TB up front but even if you use 1.8TB which is about what I'm using now with my VMs and stuff the cost still handily beats everyone out there.
What's the most crazy is I lack the purchasing power of any of these clouds out there, I'm just a lone consumer, that bought one server. Granted I'm confident the vendor I bought from gave me excellent pricing due to my past relationship, though probably still not on the scale of the likes of Rackspace or Amazon and yet I can handily beat their costs without even working for it.
What surprised me most during my trips doing cost analysis of the "cloud" is how cheap enterprise storage is. I mean Terremark charges $0.25/GB per month(on SATA powered 3PAR arrays), Rackspace charges $0.15/GB per month(I believe Rackspace just uses DAS). I kind of would of expected the enterprise storage route to cost say 3-5x more, not less than 2x. When I was doing real enterprise cloud pricing storage for the solution I was looking for typically came in at 10-20% of the total cost, with 80%+ of the cost being CPU+memory. For me it's a no brainier - I'd rather pay a bit more and have my storage on a 3PAR of course (when dealing with VM-based storage not bulk archival storage). With the average cost of my storage for 3.6TB over 2 years coming in at $0.06/GB it makes more sense to just do it myself.
I just hope my new server holds up, my last one lasted a long time, so I sort of expect this one to last a while too, it got burned in before I started using it and the load on the box is minimal, would not be too surprised if I can get 5 years out of it - how big will HDDs be in 5 years?
I will miss Terremark because of the reliability and availability features they offer, they have a great service, and now of course are owned by Verizon. I don't need to worry about upgrading vSphere any time soon as there's no reason to go to vSphere 5. The one thing I have been contemplating is whether or not to put my vSphere management interface behind the OpenBSD firewall(which is a VM of course on the same box). Kind of makes me miss the days of ESX 3, when it had a built in firewall.
I'm probably going to have to upgrade my cable internet at home, right now I only have 1Mbps upload which is fine for most things but if I'm doing off site backups too I need more performance. I can go as high as 5Mbps with a more costly plan. 50Meg down 5 meg up for about $125, but I might as well go all in and get 100meg down 5 meg up for $150, both plans have a 500GB cap with $0.25/GB charge for going over. Seems reasonable. I certainly don't need that much downstream bandwidth(not even 50Mbps I'd be fine with 10Mbps), but really do need as much upstream as I can get. Another option could be driving a USB stick to the co-lo, which is about 35 miles away, I suppose that is a possibility but kind of a PITA still given the distance, though if I got one of those 128G+ flash drives it could be worth it. I've never tried hooking up USB storage to an ESX VM before, assuming it works? hmmmm..
Another option I have is AT&T Uverse, which I've read good and bad things about - but looking at their site their service is slower than what I can get through my local cable company (which truly is local, they only serve the city I am in). Another reason I didn't go with Uverse for TV is due to the technology they are using I suspected it is not compatible with my Tivo (with cable cards). Though AT&T doesn't mention their upstream speeds specifically I'll contact them and try to figure that out.
I kept the motherboard/cpus/ram from my old server, my current plan is to mount it to a piece of wood and hang it on the wall as some sort of art. It has lots of colors and little things to look at, I think it looks cool at least. I'm no handyman so hopefully I can make it work. I was honestly shocked how heavy the copper(I assume) heatsinks were, wow, felt like 1.5 pounds a piece, massive.
While my old server is horribly obsolete, one thing it does have even on my new server is being able to support more ram. Old server could go up to 24GB(I had a max of 6GB at the time in it), new server tops out at 8GB (have 8GB in it). Not a big deal as I don't need 24GB for my personal stuff but just thought it was kind of an interesting comparison.
This blog has been running on the new server for a couple of weeks now. One of these days I need to hook up some log analysis stuff to see how many dozen hits I get a month.
If Terremark could fix three areas of their vCloud express service - one being resource pool-based billing, another being relaxing the costs behind opening multiple ports in the firewall (or just giving 1:1 NAT as an option), and the last one being thin provisioning friendly billing for storage -- it would really be a much more awesome service than it already is.
Mac Daddy P10000
It's finally here, the HP P10000 - aka 3PAR V Class. 3PAR first revealed this to their customers more than a year ago, but the eagle has landed now.
When it comes to the hardware - bigger is better (usually means faster too)
Comparisons of recent 3PAR arrays
| Array | Raw Capacity | Fibre Ports | Data Cache | Control Cache | Disks | Interconnect Bandwidth | I/O Bandwidth | SPC-1 IOPS |
|---|---|---|---|---|---|---|---|---|
| 8-node P10000 (aka V800) | 1,600 TB | 288 ports (192 host) | 512 GB | 256 GB | 1,920 | 112 GB/sec | 96 GB/sec | 600,000 (guess) |
| 8-node T800 | 800 TB | 192 ports (128 host) | 96 GB | 32 GB | 1,280 | 45 GB/sec | 19.2 GB/sec | 225,000 |
| 4-node T800 (or 4-node T400) | 400 TB | 96 (64 host) | 48 GB | 16 GB | 640 | 9.6 GB/sec | ? | ~112,000 (estimate) |
| 4-node F400 | 384 TB | 32 (24 host) | 24 GB | 16 GB | 384 | 9.6 GB/sec ? | ? | 93,000 |
The new system is based on their latest Generation 4 ASIC, and for the first time they are putting two ASICs in each controller. This is also the first system that supports PCI Express, with if my memory serves 9 PCI Express buses per controller. Front end throughput is expected to be up in the 15 Gigabytes/second range (up from ~6GB on the T800). Just think they have nearly eight times the interconnect bandwidth than the controllers have capacity to push data to hosts, that's just insane.
IOPS - HP apparently is not in a big rush to post SPC-1 numbers, but given the increased spindle count, cache, doubling up on ASICs, and the new ASIC design itself I would be surprised if the system would get less than say half a million IOPS on SPC-1 (by no means a perfect benchmark but at least it's a level playing field).
It's nice to see 3PAR finally bulk up on data cache (beefcake!!) - I mean traditionally they don't need it all that much because their architecture blows the competition out of the water without breaking a sweat - but still - ram is cheap - it's not as if they're using the same type of memory you find in CPU cache - it's industry standard ECC DIMMs. RAM may be cheap, but I'm sure HP won't charge you industry standard DIMM pricing when you go to put 512GB in your system!
Now that they have PCI Express 3PAR can natively support 8Gbps fibre channel as well as 10Gbit iSCSI and FCoE which are coming soon.
The drive cages and magazines are more or less unchanged (physically) from the previous generation but apparently new stuff is still coming down the pike there. The controller's physical design (how it fits in the cabinet) seems radically different than their previous S or T series.
Another enhancement for this system is they expanded the number of drive chassis to 48, or 12 per node (up from 8 per node). Though if you go back in time you'll find their earliest S800 actually supported 64 drive chassis for a time, since then they have refrained from daisy chaining drive chassis on their S/T/V class which is how they achieved the original 64 drive chassis configuration (or 2,560 disks back when disks were 9GB in size). The V class obviously has more ports so they can support more cages. I have no doubt they could go to even more cages by using ports assigned to hosts and assign them to disks, just a matter of testing. Flipping a fiber port from host to disk is pretty trivial on the system.
The raw capacity doesn't quite line up with the massive amount of control cache the system has, in theory at least if 4GB of control cache per controller is good enough for 200TB raw (per controller pair), then 32GB per controller should be able to net you 1,600 TB raw (per controller pair or 6,400 TB for the whole system), but obviously with a limit put in of 1,600 TB for the entire system they are using a lot of control cache for something else.
As far as I know the T-class isn't going anywhere anytime soon, this V class is all about even more massive scale, at a significantly higher entry level price point than the T-class(at least $100,000 more at the baseline from what I can tell), with the beauty of running the same operating system, the same user interfaces, the same software features across the entire product line. The T-class, as-is still is mind numbingly fast and efficient, even three years after it was released.
No mainframe connectivity on this baby.
Storage Federation
The storage federation stuff is pretty cool in that it is peer based, you don't need any external appliances to move the data around, the arrays talk to each other directly to manage all of that. This is where we get the first real integration between 3PAR and HP in that the entire line of 3PAR arrays as well as the Lefthand-based P4000 iSCSI systems (including the Virtual storage appliance even!) support this new peer federation (sort of makes me wonder where EVA support is - perhaps it's coming later or maybe it's a sign HP is sort of depreciating EVA when it comes to this sort of thing - I'm sure the official party line will be EVA is still a shining star).
The main advantage I think of storage federation technology over something like storage vMotion is the array has a more holistic view of what's going on in the storage system rather than just what a particular host sees, or what a particular LUN is doing. The federation should also have more information about the location of the various arrays if they are in another data center or something and make more intelligent choices about moving stuff around. Certainly would like to see it in action myself. Even though hypervisors have had thin provisioning for a while - by no means does it reduce the need for thin provisioning at the storage level (at least for larger deployments).
I'd imagine like most things on the platform the storage federation is licensed based on the capacity of the array.
If this sort of thing interests you anywhere nearly as much as it interests me you should check out the architecture white paper from HP which has some new stuff from the V class here. You don't have to register to download it like you did back in the good 'ol days.
I'd be surprised if I ever decided to work for a company large enough to be able to leverage a V-class, but if anyone from 3PAR is out there reading this (I'm sure there's more than one) since I am in the Bay area - not far from your HQ - I wouldn't turn down an invitation to see one of these in person
Oh HP.. first you kick me in the teeth by killing WebOS devices then before I know what happened you come out with a V-class and want to make things all better, I just don't know what to feel.
The joys of working with a 3PAR array, it's been about a year since I laid my hands on one (working at a different company now), I do miss it.
Oracle picks up Pillar
Most people have been expecting this for a long time, and have wondered why it didn't happen sooner, with Oracle ditching HDS as an OEM partner almost immediately after acquiring Sun.
I have read, and heard over the past year that Oracle has been for the most part destroyed in the storage market (servers doing badly as well) as a result since their Sun storage products just are not competitive. Many larger customers have been leaving to the likes of HP and IBM who could offer the "one stop shop" for servers and storage (even before HP bought 3PAR, HP had and still has their OEM'd HDS equipment).
In some informal talks with some HDS folks last year they seemed quite happy that Oracle was no longer an OEM, saying that the people over at Sun/Oracle weren't competent enough to handle the HDS stuff (*cough* too complicated *cough*), and so HDS just went in direct with most of those customers that Oracle walked away from.
Finally someone at Oracle woke up and realized there still is, and will continue to be for some time a big market for traditional SAN systems, far bigger than the market of customers willing to risk putting their data on cheap SATA controllers on servers running ZFS with high failure rates and poor performance.
So it finally happened, Oracle is buying Pillar. At first look however it really does seem like an odd scenario, from their SEC filing -
The Earn-Out therefore will only be paid to Mr. Ellison, his affiliates and, if applicable, to the other Pillar Data stockholders and option holders if the Net Revenues during Year 3 of the Earn-Out Period exceed the Net Losses, if any, during the entire Earn-Out Period.
There's no specific mention whether or not Larry is going to pay himself back for the $500M+ in loans he has given to Pillar over the years, so I suppose not. In any case it won't be until the end of 2014 when we might discover what value Oracle has placed on Pillar. One commenter on The Register mentions Pillar's revenue as $29M per year, don't know where that came from though, doing some searching myself I found references to roughly $70M in revenue, to $3B in revenue (if that was the case they would of IPO'd)
I think it's a good deal for Pillar to, they get much better validation on their products in front of customers.
I've gone through quite a bit of the information on the Pillar web site and to-date I have not seen anything that would make me want to buy their product, and have yet to hear any positive words coming from the people I know in the street/industry (granted my community is limited).
But it sure as hell beats anything that Oracle has been offering their customers recently, that alone may be enough to drive a decent amount of sales.
Pillar posted some updated SPC-1 numbers recently, a significant improvement over their original numbers, though nothing ground breaking from a competitive standpoint.
In other news, two early social media giants have fallen - MySpace being acquired for $35M, and Friendster re-inventing itself as a gaming site with Facebook authentication. I'd bet the infrastructure behind Myspace is worth about $35M by itself - Newscorp really wanted out!
Misleading 3PAR
Hello to my two readers out there!
You know I like 3PAR, have been using them for years, and know their stuff inside and out.
I was on a Computerworld article a few moments ago and saw an advertisement for 3PAR by HP and it made me cringe
While it is true that 3PAR has Intel Xeon processors, it's really the custom built ASIC that does all the heavy lifting. General purpose CPUs don't have a prayer in being able to keep up, much like general purpose CPUs don't have a prayer in keeping up with high performance network switching fabrics.
I think the advertisement is bad, and misleading (by misleading I mean removing perceived value of the 3PAR platform by implying that Intel processors are the workhorse on the system). I'm sure that 3PAR would of never had made this mistake on their own. Someone at HP needs to be educated on the platform.
So I have to knock HP on that one.
Next Gen COPAN
About a year or so ago SGI bought COPAN for what seemed like fractional pennies on the dollar, well they recently came out with the next generation of COPAN and I'm still amazed at how much storage they can fit in a rack.
ArcFiniti comes in 5 factory-configured models to suit any archive environment. Lower-capacity models can be upgraded to higher capacity, maxing out at just over 1.4PB of usable archive in a single rack.
Full specifications don't seem to be disclosed at the moment, the original COPAN systems topped out at a hefty 3,000 pounds per rack, the only storage system that I had heard of that weighed in more than 3PAR (about 2,000 pounds max per rack).
The original systems kept roughly 75% of the drives spun down at any given point.


