TechOpsGuys.com Diggin' technology every day

June 30, 2012

Synchronized Reboot of the Internet

Filed under: linux — Tags: — Nate @ 7:37 pm

[UPDATE – I’ve been noticing some people claim that kernels newer than 2.6.29 are not affected, well I got news for you, I have 200+ VMs that run 2.6.32 that say otherwise (one person in the comments mentions Kernel 3.2 is impacted too!) 🙂 ]

[ UPDATE 2 – this is a less invasive fix that my co-worker has tested on our systems:

date -s "`date -u`"

]
Been fighting a little fire that I’m sure hundreds if not thousands are fighting as well it happened at just before midnight UTC when a leap second was inserted into our systems, and well that seemed to trip a race condition in Linux, that I assume most thought was fixed but I guess people didn’t test it.

[3613992.610268] Clock: inserting leap second 23:59:60 UTC

 

The behavior as I’m sure your all aware of by now is a spike in CPU usage, normally our systems run on average under 8% cpu usage, and this really pegged them up by ten fold. Fortunately vSphere held up and we had the capacity to eat it, the resource pools helped make sure production had it’s share of CPU power. Only minimal impact to the customers, our external alerting never even went off, that was a good sign.

CPU Spike on a couple hundred VMs all at the same time (the above cluster has 441Ghz of CPU resources)

We were pretty lost at first, fortunately my co-worker had a thought maybe it was leap second related, we dug into things more and eventually came across this page (thanks google for being up to date), which confirmed the theory and confirmed we weren’t the only ones impacted by it.  Fortunately our systems were virtualized by a system that was not impacted by the issue so we did not experience any issues on the bare metal only in the VMs. From the page

Just today, Sat June 30th – starting soon after the start of the day GMT. We’ve had a handful of blades in different datacentres as managed by different teams all go dark – not responding to pings, screen blank.

They’re all running Debian Squeeze – with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I’ve also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I’m wondering.

It wasn’t long after that we started getting more confirmations of the issue from pretty much everyone out there. We haven’t dug into more of a root cause at this point we’ve been busy rebooting Linux VMs which seems to be a good workaround (didn’t need the steps indicated on the page). Even our systems that are up to date with kernel patches and stuff as recently as a month ago were impacted. Red Hat apparently is issuing a new advisory for their systems since they were impacted as well.

Some systems behaved well under the high load, others were so unresponsive they had to be power cycled. There was usually one process that was chewing through an abnormal amount of CPU, for the systems I saw it was mostly Splunk and autofs.  I think it was just coincidence though, perhaps processes that were using CPU at the instant the leap second was inserted into the system.

The internet is in the midst of a massive reboot. I pity the foo who has a massive number of systems and has to co-ordinate some complex massive reboot (unless there is another way – for me reboot was simplest and fastest).

I for one was not aware that a leap second was coming or the potential implications, it’s obvious I’m not alone. I do recall leap seconds in the past not causing issues for any of the systems I managed. I logged into my personal systems including the one that powers this blog, and there are no issues on them. My laptop runs Ubuntu 10.04 as well(same OS rev as the servers I’ve been rebooting for the past 2 hours) and no issues there either (been using it all afternoon).

Maybe someday someone will explain to me in a way that makes sense why we give a crap about adding a second, I really don’t care if the world is out of sync by a few seconds with the rest of the known universe, if it’s that important we should have a scientific time or something, and let the rest of the normal folks go about their way. Same goes for daylight savings time. Imagine the power bill as a result of this fiasco, with 1000s, to 100,000s of servers spiking to 100% CPU usage all at the same time.

Microsoft will have a field day with this one I’m sure 🙂

 

Java and DNS caching

Filed under: General — Nate @ 12:20 pm

I wanted to write a brief note on this since it’s a fairly wide spread problem that I’ve encountered when supporting Java-based applications (despite the problem I much prefer supporting Java-based apps than any other language at this point by leaps and bounds).

The problem is with a really, really stupid default setting with regards to DNS caching set in the java.security file. It’s an old setting, I recall first coming across it I want to say in 2004 or 2003 even. But it’s still a default even today, and some big names out there apparently are not aware or do not care because I come across this issue from a client perspective on what feels like a semi regular basis.

I’ll let the file speak for itself:

#
# The Java-level namelookup cache policy for successful lookups:
#
# any negative value: caching forever
# any positive value: the number of seconds to cache an address for
# zero: do not cache
#
# default value is forever (FOREVER). For security reasons, this
# caching is made forever when a security manager is set. When a security
# manager is not set, the default behavior in this implementation
# is to cache for 30 seconds.
#
# NOTE: setting this to anything other than the default value can have
#       serious security implications. Do not set it unless
#       you are sure you are not exposed to DNS spoofing attack.
#
#networkaddress.cache.ttl=-1

If your experienced with DNS at all you can probably tell right away the above is a bad default to have. The idea that you open yourself to DNS spoofing attack is just brain dead I’m sorry. You may be very well opening yourself to DNS Spoofing attacks by caching those responses. I think to a recent post of mine related to the Amazon cloud, specifically their Elastic Load Balancers – as terrible as they are they also by-design change IP addresses at random intervals. Sometimes resulting in really bad things happening.

“Amazon Web Services’ Elastic Load Balancer is a dynamic load-balancer managed by Amazon. Load balancers regularly swapped around with each other which can lead to surprising results; like getting millions of requests meant for a different AWS customer.

Swapping IPs at random is obviously heavily dependent upon all portions of DNS resolution operating perfectly. Ask anyone experienced with DNS what they do when they migrate from one IP to another and you’ll very likely hear them say they keep the old IP active for X number of DAYS or WEEKS regardless of their DNS TTL settings because some clients or systems simply don’t obey them.  This is pretty standard practice. When I moved that one company out of Fisher Plaza (previous post) to the new facility I stuck a basic Apache proxy server in the old data center for a month forwarding all requests to the new site (other things like SMTP/DNS was handled by other means).

Java takes it to a new level though I’ll admit that. Why that is a default is just, well I really don’t have words to answer that.

Fortunately Amazon EC2 customers have another solution considering how terrible ELB is, they can use Zeus (oh sorry I meant Stingray).  I’ve been using it for the past 7-8 months and it’s quite nice, easy to use, very flexible and powerful much like a typical F5 or Citrix load balancer (much easier to manage than Citrix). It even integrates with EC2 APIs – it can use an elastic IP to provide automatic fail over (fails over in 10-20 seconds if I recall right, much faster than any DNS-based system could). Because of the Elastic IP the IP of the load balancer will never change. The only real downside to Zeus is it’s limited to a single IP address (not Zeus’ fault this is Amazon limitation). So you can only do one SSL cert per Zeus cluster, the costs can add up quick if you need lots of certs, since the cost of Zeus is probably 5-10x the cost of ELB (and worth every stinking penny too).

Oh that and the Elastic IP is only external (another Amazon limitation – you may see a trend here – ELB too has no static IP internally). So if you want to load balance internal resources say web server 1 talking to web server 2, you either have to route that traffic out to the internet and back in, or to the internal IP of the Zeus load balancer, and manually update the configuration if/when the load balancer fails over because the internal IP will change. I was once taught a long time ago everything behind a VIP. Which means extensive use of internal load balancing for everything from HTTP to HTTPS, to DNS to NTP to SMTP – everything. With the Citrix load balancers we’ve extended our intelligent load balancing to MySQL since it has native support for MySQL now (not aware of anyone else that does – Layer 4 load balancing doesn’t count here).

Amazon Cloud: Two power outages in two weeks

Filed under: Datacenter — Tags: , — Nate @ 11:54 am

By now you should know I’m no fan of Amazon’s cloud, it makes me feel I’m stuck in the 90s when I use it. I’ve been using it quite a bit for the past two years(with two different companies) but finally about to get the hell out of there. The last set of systems is set to migrate before my trip to Seattle.

Last week they had one outage in one of their availability zones, though it took them well over an hour to admit it was a power outage, they first tried to say “oh some volumes are experiencing increased latency”. What a load of crap. It should take all of 5 seconds to know there is a power outage. The stuff I manage had minor impact fortunately since we are down to just a few things left, we lost some stuff but none of it critical.

Then last night they have another one, which seems to have made some news too.

A slew of sites, including Netflix, Instagram and Pinterest, have gone down this evening, thanks to “power issues” at Amazon’s Elastic Compute Cloud data center in North Virginia. The websites rely on Amazon’s cloud services to power their services. Some pretty violent storms in the region are apparently causing the problems.

This had slightly more impact on stuff I’m responsible for, one of my co-workers handled the issues it wasn’t much to do fortunately. I can only imagine the havok of a larger organization like one of the above that depend more heavily on their cloud.

What a lot of people don’t realize though is these two outages aren’t really considered outages in Amazon’s mind, at least for that region, because only one data center or part of one data center went off line. Their SLA is worded so that they exempt themselves from the effects of such an outage and put the onerous on the customer to deal with it. I suspect these facilities aren’t even Tier IV, because Tier IV is expensive and Amazon is about cheap. If they were Tier IV a simple storm wouldn’t of caused equipment to lose power.

I remember a couple years ago the company I was at had some gear co-located near Chicago at an Equinix site, some big storms and flooding if I remember right rolled through. We didn’t have redundant power of course(more on that below), but there was no impact to the equipment other than an email to us saying the site was on generator power for some time and then another email saying the site was back on utility power.

There are exceptions of course, poor design being one. I think back to what was once Internap‘s premier data center in Seattle Fisher Plaza which was once plagued by power issues and eventually resulted in more then 24 hours of downtime due to a fire knocking out many well known sites like Bing Travel as well as many others. It took them months to repair the facility, they had generator trucks sitting out front providing generator power 24/7 during the repairs. From a storage perspective I remember being told stories of at least one or two customers’ NetApp equipment taking more then 24 hours to come back online (file system checks), I’m sure folks that had battery backed cache were in sort of a panic not knowing when or if power would be restored to the facility. Some of my friends were hosted there at another company with a really small 3PAR array and were not worried though, because 3PAR systems dump their cache to an internal disk on the controller when the power goes out, so batteries are not required past that point. Since cache is mirrored there is two copies of it stored on different disks. Some newer systems have fancy flash-backed cache that is even nicer.

Fisher Plaza for a while had about one power outage per year, every year for at least 3 years in a row. Including the somewhat famous EPO event where someone went out of their way to hit the Emergency Power Off switch (there was no emergency) and shut down the facility. After that all customers had to go through EPO Training, which was humorous.

Being the good operations person that I am, shortly after I started work at a company back in 2006 that was hosted at Fisher plaza I started working on plans to move them out – the power issues was too much to bear. We still had about nine months left in our contract and my boss was unsure how we could go before that was up given it would cost a lot. I had an awesome deal on the table from a local AT&T Facility which I had good experiences with (though density wise they are way out dated and after an AT&T re-organization in around 2008 I wouldn’t even consider AT&T as a data center provider now). Anyways, I had this great deal and wanted to move but we had a hard time getting past the fact that we still owed a ton on the contract for Internap and we couldn’t get out of it. Then Fisher Plaza had another power outage (this was in 2006, the fire was three years later). The VP said to us something along the lines of I don’t care what it takes, I want to get out of there now. Music to my ears, things got moving quickly and we moved out within a month or so. I was hosted at that AT&T data center for a good 5 years personally and the companies I was at was hosted there for a good I want to say 8-9 years between the two without a single power event that I am aware of. I was there once when the facility lost power, but the data center floor was unaffected. I believe there was a few other power outages, but again nothing impacting customer equipment.

There are other bad designs out there too – personally I consider anything that relies on a flywheel UPS to be badly designed, because there isn’t enough time for on site personnel to manually try to correct a situation before the UPS runs out of juice.  I want at least 10-15 minutes at full load.

Internap later opened a newer fancier data center down in Tukwila in facility owned by Sabey. That is a massive campus, they claim 1.2M square feet of data center space. There is a large Microsoft presence there as well. On one of my tours of the facility I asked their technical people whether or not they use real UPS or a Flywheel, and they said real UPS. They commented how Microsoft literally next door used Flywheel and they said how Microsoft is seemingly constantly running their generators(far more frequently than your typical routine load testing), they did not know specifically why but speculated maybe they don’t trust the Fly wheels, and laughed with me. That same Internap facility had another power outage, shortly after it opened though that one was human error. Apparently there was some fault in a UPS, and some person did something bad, the only way to fix it was to shut everything down. Internap claimed they addressed that problem by having every on site action double checked and signed off. I know people that are hosted there and have not heard of issues since the new policies were put in place.

Another reason is being a cheap bastard. I think Amazon falls into this area – they address it for their own applications with application level availability, global load balancing and fancy Citrix load balancers.  I was at another company a few years ago that fell into this area too of being a cheap bastard and not wanting to invest in redundant power. People view power as a utility that won’t ever go down, especially in a data center – and this view is reinforced the longer you go without having a power outage. I remember a couple outages at a real cheap co-location the company was using in Seattle, where some other customer plugged a piece of fancy Cisco gear in and for some reason it tripped the UPS which knocked out a half dozen of our racks, because they didn’t have redundant power. So naturally we had an outage due to that.  The same thing happened again a few weeks later, after the customer replaced the Cisco gear with a newer Cisco thing and the UPS tripped again. Don’t know why.

The back end infrastructure was poorly designed as well, they had literally roughly 2 dozen racks all running off the same UPS, none of them had redundant power(I thought how hard can it be to alternate between UPSs every other rack? Apparently they didn’t think of that or didn’t want to spend for it).  It was a disaster waiting to happen. They were lucky and they did not have such a disaster while I was there. It was like pulling teeth to get them to commit to redundant power for the new 3PAR system, and even then they’d only agree to one UPS feed and one non UPS feed. This had it’s own issues on occasion.

One of my former co-workers told me a story about a data center he used to work at – the worst of both worlds – bad design AND cheap bastard. They bought these generators and enclosed them somewhat in some sort of structure outside. Due to environmental regulations they could not test them very often, only a couple minutes a month or something like that. Maybe the generators were cheap crappy ones that belched out more pollution than others, I don’t know. But the point is they never could fully test them. They had a real power outage one day, and they went outside and watched as the generators kicked on, they were happy.

Then a few minutes later they shut down and the facility lost all power. WTF? They went and turned them on again, and a few minutes later they shut off again.  Apparently the structure they built around the generators did not leave enough space for cooling and the generators were overheating and shutting down.

Back to Amazon and their SLAs (or lack thereof). I’m torn between funny and sad when I see people attacking Amazon customers like Netflix or the other social things that are on their cloud when they go down as a result of an Amazon downtime. They rag on the customers for not making their software more resilient against such things. Amazon expects you to do this, they do it after all if Amazon can do it anyone can right?

Yeah, reality is different. Most companies do not do that and probably never will. At a certain scale it makes sense, for some applications it makes sense. For the vast majority it does not, and the proof is in the pudding – most companies don’t do it. I’ve worked at two different companies that built their apps from the ground up in Amazon and neither made any considerations for this aspect of availability. I know there are folks out there that DO do this but they are in the small minority, who think they are hip because they can survive a data center going down without impacting things.

It’s far simpler, and cheaper to address the problem in a more traditional way with highly available infrastructure for the vast majority of applications anyways. Disasters do happen and you should still be prepared, but that’s far different from the Amazon model of “built to fail”. These aren’t the first power issues Amazon has had and certainly won’t be the last.

The main point to this post is trying to illustrate the difference in how the SLAs are worded, how the particular service provider responds, and how customers respond to the event.

A counter example I have brought up many times, a combination of a power issue AND a fire over at a Terremark facility a few years ago, resulted in no customer impact. Good design and no cheap bastards there.

Some irony here is that Amazon tries to recruit me about once every six months. I politely tell them I’m not interested, unless it’s a person I know then I tell them why I’m not interested, and believe me I’m being incredibly polite here.

The current state of Infrastructure as a Service cloud offerings is just a disaster in general (there are some exceptions to parts of the rules here and there). Really everything about it is flawed from the costs to the availability to the basic way you allocate resources. For those of you out there that use cloud offerings and feel like you’ve traveled back in time I feel your pain, it’s been the most frustrating two years of my career by far. Fortunately that era is coming to a close in a couple of weeks and boy does it feel good.

This blog had a many hour outage recently, of course it’s not powered by redundant systems though it does have redundant power supplies(I suspect the rack doesn’t have true redundant power I don’t know it’s a managed co-location though I own the server). A few nights ago there was some networking issues, I don’t know details I haven’t tried to find out. But the  provider who gets me the service(I think they have a cage in the facility they are a computer reseller), had their website on the same subnet as mine and I saw that was unreachable as well.

Whatever it was it was not a power issue since the uptime of my systems was unchanged once things got fixed. Though my bridging OpenBSD VM running pf on my ESXi system crashed for some reason (internal VMware error – maybe too many network errors). So I had to manually fire up the VM again before my other VMs could get internet access.  Not the end of the world though it’s just one small server running personal stuff. As you might know I ran my server in the Terremark cloud for about a year while I transitioned between physical server hosts (last server was built in 2004, this one about a year ago). When I started thinking about off site backups, I very quickly determined that cloud wasn’t going to cut it for the costs and it was far cheaper to just buy a server with RAID and put it in a co-lo, with roughly 3.6TB of usable capacity protected by RAID-10 on enterprise nearline SAS drives and a hardware RAID controller with battery backed cache I’m happy.

June 26, 2012

In Seattle area July 13 – 22nd – and at Velocity 2012 tonight

Filed under: General — Tags: — Nate @ 7:41 am

Hey folks!

For my friends up in SEA I wanted to announce that I plan to be in Bellevue from July 13th and leave on the 22nd. Looking at that Wikipedia article

In 2008, Bellevue was named number 1 in CNNMoney’s list of the best places to live and launch a business. More recently, Bellevue was ranked as the 4th best place to live in America.

I guess I was pretty lucky to both live and work in Bellevue in 2008! (lived right across the street from work even) – I do miss it, though it was growing too fast (for good reason I’m sure). The downtown area just exploded during the 11 years I lived there(I lived downtown). My new home doesn’t seem to have won any awards at least according to wikipedia. I still walk to work – but it’s much further away (especially after a recent office move now 0.8 miles each way, double what it used to be).

As I mentioned before I’m driving up the coast (at least part way, haven’t decided how far past Crescent City, CA I will go). I thought about it and ordered a new camera which should get here soon and I’m pretty excited to try it out. I wanted something with a more powerful optical zoom than the 12X I have now so thought I would have to go DSLR. After doing a bit of checking it seems that typical DSLR zoom lenses aren’t that impressive (from a zoom factor at least). So I came across the Nikon P510 and it’s unbelievable 42X optical zoom (I was expecting to get something more like 20X),  so it wasn’t a very hard decision to make. The camera is bigger than what I have now, but it’s about the same size as a Kodak camera I had before which had 10X zoom (bought in 2005).

I’m very much an amateur picture taker (I couldn’t bring myself to use the term photographer because I suck), but I really did like the scenery along the western coast of the country the last trips I took.

So not only does it have a massive zoom, just what I wanted but the price is good too! To think I was thinking about spending $1500 for a short while until I determined the DSLR zoom wasn’t as high of a zoom factor as I thought they would be(given the price). I know the picture quality of the DSLRs are much better – or at least can be in the right hands (my hands are not the right hands).

Anyways back on topic – my trip to Seattle, I mean to Bellevue, I’m staying at the Larkspur Hotel, a chain I had not heard of before but it looks like a really nice place, and close to my former home. The plans are not 100% finalized but I think I am 95-98% sure at this point. I opted for a refundable room just in case 🙂

Of course I plan to be at the usual places while I’m there, I’ll be working for a few days at least since most of the action will be at night and weekends anyways.

Also if your in the Bay Area and aren’t doing anything else tonight there is a Music + Tech party sponsored by Dynect (who is a DNS provider I’ve been using for a few years they have great service), in Santa Clara tonight at 8PM. Barring an emergency or something I’m planning to be there. Unlike the main Velocity 2012 conference this event does not require a pass/tickets to attend.

June 25, 2012

Exanet – two years later – Dell Fluid File system

Filed under: Storage — Tags: , — Nate @ 10:21 pm

I was an Exanet customer a few years ago up until they crashed. They had a pretty nice NFS cluster for scale out, well at least it worked well for us at the time and it was really easy to manage.

Dell bought them over two years ago and hired many of the developers and have been making the product better I guess over the past couple of years. Really I think they could of released a product – wait for it – a couple of years ago given that Exanet was simply a file system that ran on top of CentOS 4.x at the time. Dell was in talks with Exanet at the time they crashed to make Exanet compatible with an iSCSI back end (because really who else makes a NAS head unit that can use iSCSI as a back end disk). So even that part of the work was pretty much done.

It was about as compatible as you could get really. It would be fairly trivial to certify it against pretty much any back end storage. But Dell didn’t do that, they sat on it making it better(one would have to hope at least). I think at some point along the line perhaps even last year they released something in conjunction with Equallogic – I believe that was going to be their first target at least, but with so many different names for their storage products I’m honestly not sure if it has come out yet or not.

Anyways that’s not the point of this post.

Exanet clustering, as I’ve mentioned before was sort of like 3PAR for file storage. It treated files like 3PAR treats chunklets. It was highly distributed (but lacked data movement and re-striping abilities that 3PAR has had for ages).

Exanet File System Daemon - a software controller for files in the file system, typically one per CPU core, a file had a primary FSD and a secondary FSD. New files would be distributed evenly across all FSDs.

One of the areas where the product needed more work I thought was being able to scale up more. It was a 32-bit system – so inherited your typical 32-bit problems like memory performance going in the tank when you try to address large amounts of memory. When their Sun was about to go super nova they told me they had even tested up to 16-node clusters on their system, they could go higher there just wasn’t customer demand.

3PAR too was a 32-bit platform for the longest time, but those limitations were less of an issue for it because so much of the work was done in hardware – it even has physical separation of the memory used for the software vs the data cache. Unlike Exanet which did everything in software, and of course shared memory between the OS and data cache. Each FSD had it’s own data cache, something like up to 1.5GB per FSD.

Requests could be sent to any controller, any FSD, if that FSD was not the owner of the file it would send a request on a back end cluster interconnect and proxy the data for you, much like 3PAR does in it’s clustering.

I believed it was a great platform to just throw a bunch of CPU cores and gobs of memory at, it runs on a x86-64 PC platform (IBM Dual socket quad core was their platform of choice at the time). 8, 10 and 12 core CPUs were just around the corner, as were servers which could easily get to 256GB or even 512GB of memory. When your talking software licensing costs being in the tens of thousands of dollars – give me more cores and ram, the cost is minimal on such a commodity platform.

So you can probably understand my disappointment when I came across this a few minutes ago, which tries to hype up the upcoming Exanet platform.

  • Up to 8 nodes and 1PB of storage (Exanet could do this and more 4 years ago – though in this case it may be a Compellent limitation as they may not support more than two Compellent systems behind a Exanet cluster – docs are un clear) — Originally Exanet was marketed as a system that could scale to 500TB per 2-node pair. Unofficially they preferred you had less storage per pair (how much less was not made clear – at my peak I had around I want to say 140TB raw managed by a 2-node cluster? It didn’t seem to have any issues with that we were entirely spindle bound)
  • Automatic load balancing (this could be new – assuming it does what it implies – which the more I think about it I’d be it does not do what I think it should do and probably does the same load balancing Exanet did four years ago which was less load balancing and more round robin distribution)
  • Dual processor quad core with 24GB – Same controller configuration I got in 2008 (well the CPU cores are newer) — Exanet’s standard was 16GB at the time but  you could get a special order and do 24GB though there was some problem with 24GB at the time that we ran into during a system upgrade I forgot what it was.
  • Back end connectivity – 2 x 8Gbps FC ports (switch required) — my Exanet was 4Gbps I believe and was directly connected to my 3PAR T400, queue depths maxed out at 1500 on every port.
  • Async replication only – Exanet had block based async replication this in late 2009/early 2010. Prior to that they used a bastardized form of rsync (I never used either technology)
  • Backup power – one battery per controller. Exanet used old fashioned UPSs in their time, not sure if Dell integrated batteries into the new systems or what.
  • They dropped support for Apple File Protocol. That was one thing that Exanet prided themselves on at the time – they even hired one of the guys that wrote the AFP stack for Linux, they were the only NAS vendor (that I can recall) at the time that supported AFP.
  • They added support for NDMP – something BlueArc touted to us a lot at the time but we never used it, wasn’t a big deal. I’d rather have more data cache than NDMP.

I mean from what I can see I don’t really see much progress over the past two years. I really wanted to see things like

  • 64-bit (the max memory being 24GB implies to me still a 32-bit OS+ file system code)
  • Large amounts of memory – at LEAST 64GB per controller – maybe make it fancy and make it it flash-backed? RAM IS CHEAP.
  • More cores! At least 16 cores per controller, though I’d be happier to see 64 per controller (4x Opteron 6276 @ 2.3Ghz per controller) – especially for something that hasn’t even been released yet. Maybe based on Dell R815 or R820
  • At least 16-node configuration (the number of blades you can fit in a Dell blade chassis(perhaps running Dell M620), not to mention this level of testing was pretty much complete two and a half years ago).
  • SSD Integration of some kind – meta data at least? There is quite a bit of meta data mapping all those files to FSDs and LUNs etc.
  • Clearer indication that the system supports dynamic re-striping as well as LUN evacuation (LUN evacuation especially something I wanted to leverage at the time – as the more LUNs you had the longer the system took to fail over. In my initial Exanet configuration the 3PAR topped out at 2TB LUNs, later they expanded this to 16TB but there was no way from the Exanet side to migrate to them, and Exanet being fully distributed worked best if the back end was balanced so it wasn’t a best practice to have a bunch of 2TB LUNs then start growing by adding 16TB LUNs you get the idea) – the more I look at this pdf the less confident I am in them having added this capability (that PDF also indicates using iSCSI as a back end storage protocol).
  • No clear indication that they support read-write snapshots yet (all indications point to no). For me at the time it wasn’t a big deal, snapshots were mostly used for recovering things that were accidentally deleted. They claim high performance with their redirect on write – though in my experience performance was not high. It was adequate with some tuning, they claimed unlimited snapshots at the time, but performance did degrade on our workloads with a lot of snapshots.
  • A low end version that can run in VMware – I know they can do it because I have an email here from 2 years ago that walks you through step by step instructions installing an Exanet cluster on top of VMware.
  • Thin provisioning friendly – Exanet wasn’t too thin provisioning friendly at the time Dell bought them – no indication from what I’ve seen says that has changed (especially with regards to reclaiming storage). The last version Exanet released was a bit more thin provisioning friendly but I never tested that feature before I left the company, by then the LUNs had grown to full size and there wasn’t any point in turning it on.

I can only react based on what I see on the site – Dell isn’t talking too much about this at the moment it seems, unless perhaps your a close partner and sign a NDA.

Perhaps at some point I can connect with someone who has in depth technical knowledge as to what Dell has done with this fluid file system over the past two years, because really all I see from this vantage point is they added NDMP.

I’m sure the code is more stable, easier to maintain perhaps, maybe they went away from the Outlook-style GUI, slapped some Dell logos on it, put it on Dell hardware.

It just feels like they could of launched this product more than two years ago minus the NDMP support (take about 1 hour to put in the Dell logos, and say another week to certify some Dell hardware configuration).

I wouldn’t imagine the SpecSFS performance numbers would of changed a whole lot as a result, maybe it would be 25-35% faster with the newer CPU cores (those SpecSFS results are almost four years old). Well performance could be boosted more by the back end storage. Exanet used to use the same cheap crap LSI crap that BlueArc used to use (perhaps still does in some installations on the low end). Exanet even went to the IBM OEM version of LSI and wow have I heard a lot of horror stories about that too(like entire arrays going off line for minutes at a time and IBM not being able to explain how/why then all of a sudden they come back as if nothing happened). But one thing Exanet did see time and time again, performance on their systems literally doubled when 3PAR storage was used (vs their LSI storage). So I suspect fancy Compellent tiered storage with SSDs and such would help quite a bit in improving front end performance on SpecSFS. But that was true when the original results were put out four years ago too.

What took so long? Exanet had promise, but at least so far it doesn’t seem Dell has been able to execute on that promise. Prove me wrong please because I do have a soft spot for Exanet still 🙂

June 22, 2012

NetApp Cluster SPC-1

Filed under: Storage — Tags: , — Nate @ 7:19 pm

Sorry for the off topic posts recently – here is something a little more on topic.

I don’t write about NetApp much, mainly because I believe they have some pretty decent technology, they aren’t a Pillar or an Equallogic. Though sometimes I poke fun. BTW did you hear about that senior Oracle guy that got canned recently and the comments he made about Sun? Oh my, was that funny. I can only imagine what he thought of Pillar. Then there are the folks that are saying Oracle is heavily discounting software so they can sell hardware at list price thus propping up the revenues, net result is Oracle software folks hate Sun. Not a good situation to be in. I don’t know why Oracle couldn’t of just of been happy owning BEA Jrockit JVM and let Sun whither away.

Anyways…

NetApp tried to make some big news recently when they released their newest OS, Ontap 8.1.1. For such a minor version number change (8.1 -> 8.1.1) they sure did try to raise a big fuss about it. Shortly after 8.1 came out I came across some NetApp guy’s blog who was touting this release quite heavily. I was interested in some of the finer points and tried to ask some fair technical questions – I like to know the details. Despite me being a 3PAR person I tried, really hard to be polite and balanced, and the blogger was very thoughtful, informed and responsive and gave a great reply to my questions.

Anyways I’m still sort of un clear what is really new in 8.1.1 vs 8.1 – it sounds to me like it’s just some minor changes from a technical side and they slapped some new marketing on top of it. Well I think the new Hybrid aggregates are perhaps specifically new to 8.1.1 (Also I think some new form of Ontap that can run in a VM for small sites). Maybe 8.1 by itself didn’t make a big enough splash. Or maybe 8.1.1 is what 8.1 was supposed to be (I think I saw someone mention that perhaps 8.1 was a release candidate or something). The SpecSFS results posted by NetApp for their clusters are certainly pretty impressive from a raw performance standpoint. They illustrate excellent scalability up to 24 nodes.

But the whole story isn’t told in the SpecSFS results – partially because things like cost are not disclosed in the results, partially because it doesn’t illustrate the main weakness of the system in that it’s not a single file system, it’s not automatically balanced from either a load or a space perspective.

But I won’t harp on that much this post is about their recent SPC-1 results which I just stumbled upon. These are the first real SPC-1 results NetApp has posted in almost four years – you sort of have to wonder what took them so long. I mean they did release some SPC-1E results a while back but those are purely targeting energy measurements. For me at least, energy usage is probably not even in the top 5 things I look for when I want some new storage. The only time I really care about energy usage is if the installation is really, really small. I mean like the whole site being less than one rack. Energy efficiency is nice but there are a lot of things that are higher on my priority list.

This SPC-1 result from them is built using a 6-node cluster, 3TB of flash cache and 288 GB of data cache spread across the controllers, and only 432 disks – 144 x 450GB per pair of controllers protected with RAID DP. The cost given is $1.67M for the setup. They say it is list pricing – so not being a customer of theirs I’m not sure if it’s apples to apples compared to other setups – some folks show discounted pricing and some show list – I would think it would heavily benefit the tester to illustrate the typical price a customer would pay for the configuration.

  • 250,039 IOPS  @ 3.35ms latency  ($6.89 per SPC-1 IOP)
  • 69.8TB Usable capacity ($23,947 per Usable TB)

Certainly a very respectable I/O number and really amazing latency – I think this is the first real SPC-1 result that is flash accelerated (as opposed to being entirely flash).

What got me thinking though was the utilization. I ragged on what could probably be considered a tier 3 or 4 storage company a while back for just inching by the SPC-1 minimum efficiency requirements. The maximum unused storage cannot exceed 45% and that company was at 44.77%.

Where’s NetApp with this ? Honestly higher than I thought especially considering RAID DP they are at 43.20% unused storage. I mean really – would it not make more sense to simply use RAID 10 and get the extra performance ? I understand that NetApp doesn’t support RAID 10 but it just seems a crying shame to have such low utilization of the spindles. I really would of expected the Flash cache to allow them to drive utilization up. But I suppose they decided to inch out more performance at the cost of usable capacity. I’d honestly be fascinated to see results when they drive unused storage ratio down to say 20%.

The flash cache certainly does a nice job at accelerating reads and letting the spindles run more writes as a result. Chuck over at EMC wrote an interesting post where he picked apart the latest NetApp release. What I found interesting from an outsider perspective is how so much of this new NetApp technology feels bolted on rather than integrated. They seem unable to adapt the core of their OS with this (now old) scale out Spinmaker stuff even after this many years have elapsed.  From a high level perspective the new announcements really do sound pretty cool. But once I got to know more aobut what’s on the inside,  I became less enthusiastic about them. There’s some really neat stuff there but at the same time some pretty dreadful shortcomings in the system still (see the NetApp blog posts above for info).

The plus side though is that at least parts of NetApp are becoming more up front with where they target their technology. Some of the posts I have seen recently both in comments on The Register as well as the NetApp blog above have been really excellent. These posts are honest in that they acknowledge they can’t be everything to everyone, they can’t be the best in all markets. There isn’t one storage design to rule them all. As EMC’s Chuck said – compromise. All storage systems have some degree of compromise in them, NetApp always seems to have had less compromise on the features and more compromise on the design. That honesty is nice to see coming from a company like them.

I met with a system engineer of theirs about a year ago now when I had a bunch of questions to ask and I was tired of getting pitched nothing but dedupe. This guy from NetApp came out and we had a great talk for what must’ve been 90 minutes. Not once was the word dedupe used and I learned a whole lot more about the innards of the platform. It was the first honest discussion I had had with a NetApp rep in all the years I had dealt with them off and on.

At the end of the day I still wasn’t interested in using the storage but felt that hey – if some day I really feel a need to combine the best storage hardware, with what many argue to say is the best storage software (management headaches aside e.g no true active-active automagic load balanced clustering), I can – just go buy a V-series and slap it in front of a 3PAR. I did it once before (really only because there was no other option at the time). I could do it again. I don’t plan to (at least not at the company I’m at now). But the option is there. Just as long as I don’t have to deal with the NetApp team in the Northwest and their dirty underhanded threatening tactics. I’m in the Bay area now so that shouldn’t be hard. The one surprising thing I heard from the reps here is they still can’t do evaluations. Which just seems strange to me. The guy told me if a deal hinged on an evaluation he wouldn’t know what to do.

3PAR of course has no such flash cache technology shipping today, something I’ve brought up with the head of HP storage before. I have been wanting them to release something like it (more specifically more like EMC’s Fast cache – EMC has made some really interesting investments in Flash over recent years – but like NetApp – at least for me the other compromises involved in using an EMC platform doesn’t make me want to use it over a 3PAR even though they have this flash technology) for some time now. I am going to be visiting 3PAR HQ soon and will learn a lot of cool things I’m sure that I won’t be able to talk about for some time to come.

June 17, 2012

Can Sprint do anything else to drive me away as a customer

Filed under: General — Tags: — Nate @ 2:15 pm

I write this in the off chance it shows up in one of those Google alert searches that someone over at Sprint may be reading.

I’ll keep it pretty short.

I’ve been a customer of Sprint for 12 years now, for the most part I had been a happy customer. The only problems I had with Sprint was when I had to deal with customer service, the last time I had a customer service issue was in 2005.

I dropped Sprint last year for AT&T in order to use the GSM HP Pre 3 which I ordered at high cost from Europe. My Sprint Pre basically died early last year and I sat with a feature phone to fill the gap. Sprint charged me something like $100 or so to change to AT&T even though I was no longer in contract(paid full price for that feature phone). I think that was wrong but whatever I didn’t care.

Fast forward to late last year/early this year – I’m still a Sprint customer – not a phone customer but a Mifi 3G/4G customer. I bought the Mifi almost two years ago primarily for on call type stuff. I hardly ever use it. I’d be surprised if I did 15GB of data transfer over the past 2 years.

Sprint sent me my usual bill, and it had some note about e-billing in it. Whatever, I paid my bill and didn’t think about it. I don’t like e-billing, I don’t want e-billing.

Given I don’t use the Mifi much it was about a month later that I tried to use it to find my service disconnected. Whatever, it wasn’t that important at the time.

Later I got some letter saying Hey pay your bill to reconnect your service! I still didn’t care since I wasn’t using it anyways.

Later I got some collection notices from some collection agency (first time I’ve ever gotten one of those). So I figured I should pay. So I called Sprint – paid the bill and the first rep I talked to swore I was on paper bills and it must be a problem with the post office. I could see perhaps the post office missing one bill (I don’t recall them ever missing any in the past 15 years), but not more than one. She swore it was set right on their side. So I hung up, not really knowing what to do next.

I logged onto their site and it clearly said I was signed up for e-billing. So without changing it back to paper I called back and got another rep. This rep said I was signed up for e-billing. I asked her why – she said it only would of happened if I had requested it. I told her I did not and she looked at the call logs going back SEVEN YEARS and confirmed I never requested it. I asked did Sprint do some kind of bulk change to e-billing she said no. I asked how it got set she didn’t know.

I asked first I want to change it to paper billing, then I want some kind of notification whenever it changes. She said notification goes out via SMS. Well I am on a MIFI – no SMS here. She updated the system to send me notifications via the mail(I would hope their system would detect the only device they have for me doesn’t support SMS and automatically use another method but No it doesn’t). There wasn’t more she could do, I was transferred to another rep who said about the same thing. They assured me it was fixed.

Shortly thereafter I got a paper bill, and I paid it like I always do, yay no more problems.

Fast forward more time (a month or two who knows) to today – I get another bill-like thing in the mail from Sprint. But it’s not a bill – it’s another one of those Hey pay your bill so you can get your service back things. Here I was thinking it felt like too much time had elapsed since my last bill.

Ok now I’m F$##$ pissed. WTF.  I would hope that their system would say this guy stopped paying his bills once he was switched to e-billing so there is probably not a coincidence here. Obviously they have no such system.

So I logged onto Sprint again, and could not find the ‘billing preferences’ part of the site that I found earlier in the year, the only link was ‘switch to e-billing to be green’. I didn’t want to click it as I wasn’t sure what it would do – I did not want to click it and have it sign me up for e-billing I wanted the transaction to be logged in a different way – I didn’t want them to be able to say HEY you asked our system today to switch you.

So I called again, and sort of like my experience in 2005 I had a hard time getting to a customer service rep. But I managed to by telling their system I wanted to be a new customer and then asking that rep to transfer me to billing dept. On one of my attempts I hit zero so many times (sometimes works to get to an operator) that their system just stopped responding, sat there for a a good two minutes in silence before I hung up and tried again.

THAT rep then confirmed that I HAD requested to be set to PAPER BILLS earlier in the year but apparently the other reps forgot a little hidden feature that says “PAPER BILLS ALL THE TIME” or “PAPER BILL ON DEMAND”. They didn’t have ALL THE TIME SET. Despite my continuous questions of the original reps.

Despite how pissed off I was in both occasions I was very polite to the reps. I didn’t want to be one of those customers, this time around it was significantly more difficult to keep calm but I managed to do so. But Sprint sure is trying hard to lose me as a customer. There’s really no reason for me to stay other than I’m locked into the contract until September. On a device I hardly ever use (maybe I’ve transferred 150 megabytes the entire year so far).

Sprint is ditching the WiMax network (which is obviously what the 4G portion of my Mifi uses), they’ve neutered their premier membership stuff (as a 12 year customer I was and still am a member), they’ve bet their future on the iPhone(a device I will never use), they’ve messed up my billing twice in less than six months and then think I’m the bad guy for not paying my bills (well the reps didn’t say that but it certainly feels that way when collection agencies start contacting me).

On top of that I’ve committed a lot of money to GSM Palm phones which means I pretty much have to stick to AT&T for the foreseeable future, if I want to use those phones. I suppose T-mobile is technically an option but the frequency differences make for a degraded experience.

There are so many other ways the situations could of been handled better by technology – the most simple as I mention – I don’t pay the bill unless I get a paper bill in the mail.  I don’t like e-bills. Sprint did not make any attempt to contact me by mail or by phone (I’m not a voice customer anymore though I expect they should have my phone# as a contact number since it hasn’t changed in 12 years) when they put me back on e-billing for a second time in six months.

I’m probably one of the well sought out customers – pretty low usage – but subscribe to unlimited plans because I don’t like surprises in the mail and I don’t want to have to worry about minutes in the event something comes up and I have a long phone call to make or something. I must be gold for their Mifi stuff since I almost never use the thing, but I still pay the $50/mo to keep the service up.

Will Sprint F*$@ up again between now and 9/5/2012 ? I’m counting the days until my contract is up, I really don’t see what they could possibly do to keep me as a customer at this point.

I guess 1385 words is short for me.

The old Linux ABI compatibility argument

Filed under: Random Thought — Tags: — Nate @ 12:45 pm

[WARNING: Rambling ahead]

Was just reading a somewhat interesting discussion over on slashdot (despite the fact that I am a regular there I don’t have an account and have posted as an Anonymous Coward about 4 times over the past twelve years).

The discussion is specifically about Linus’ slamming NVIDIA for their lack of co-operation in open sourcing their drivers and integrating them into the kernel.

As an NVIDIA customer going back to at least 1999 I think when I got a pair of cheap white boxed TNT2 PCI Graphics cards from Fry’s Electronics I can say I’ve been quite happy with their support of Linux. Sure I would love it if it was more open source and integrated and stuff, but it’s not that big of a deal for me to grab the binary drivers from their site and install them.

I got a new Dell desktop at work late last year, and specifically sought out a model that had Nvidia in it because of my positive experience with them (my Toshiba laptop is also Nvidia-based). I went ahead and installed Ubuntu 10.04 64-bit on it, and it just so happens that the Nvidia driver in 10.04 did not support whatever version of graphics chip was in the Dell box – it worked ok in safe mode but not in regular 3D/high performance/normal mode. So I went to download the driver from Nvidia’s site only to find I had no network connectivity. It seems the e1000e driver in 10.04 also did not support the network chip that happened to be in that desktop. So I had to use another computer to track down the source code for the driver and copy it over via USB or something I forget. Ever since that time whenever Ubuntu upgrades the kernel on me I have to boot to text mode to recompile the e1000e driver and re-install the Nvidia driver. As an experienced Linux user this is not a big deal to me. I have read too many bad things about Ubuntu and Unity that I would much rather put up with the pain of the occasional driver re-install than have constant pain because of a messed up UI. A more normal user perhaps should use a newer version of distro that hopefully has built in support for all the hardware (perhaps one of the Ubuntu offshoots that doesn’t have Unity – I haven’t tried any of the offshoots myself).

One of the other arguments is that the Nvidia code taints the kernel, making diagnostics harder – this is true – though I struggle to think of a single time I had a problem where I thought the Nvidia driver was getting in the way of finding the root cause. I tend to run a fairly conservative set of software(I recently rolled back my Firefox 13 64-bit on my Ubuntu at work to Firefox 3.6 32-bit due to massive stability problems with the newer Firefox(5 crashes in the span of about 3 hours)) so system crashes and stuff really aren’t that common.

It’s sad that apparently the state of ATI video drivers on Linux is still so poor despite significant efforts over the years in the open source community to make it better. I believe I am remembering right when in the late 90s Weather.com invested a bunch of resources in getting ATI drivers up to speed to power their graphics on their sets. AMD seems to have contributed quite a bit of stuff themselves. But the results still don’t seem to cut it. I’ve never to my knowledge at least used a ATI video card in a desktop/laptop setting on one of my own systems anyways. I keep watching to see if their driver/hardware situation on Linux is improving but haven’t seen much to get excited about over the years.

From what I understand Nvidia’s drivers are fairly unified across platforms, and a lot of their magic sauce is in the drivers, less of it in the chips. So myself  I can understand them wanting to protect that competitive edge. Provided they keep supplying quality product anyways.

Strangely enough the most recent kernel upgrade didn’t impact the Nvidia driver but still of course broke the e1000e driver. I’m not complaining about that though it comes with the territory (my Toshiba Laptop on the other hand is fully supported by Ubuntu 10.04 no special drivers needed – though I do need to restart X11 after suspend/resume if I expect to get high performance video(mainly in intensive 3D games). My laptop doesn’t travel much and stays on 24×7 not a big deal.

The issue more than anything else is even now after all these years there isn’t a good enough level of compatibility across kernel versions or even across user land. So many headaches for the user would be fixed if this was made more of a priority. The counter argument of course is open source the code and integrate it and it will be better all around. Except unless the product is particularly popular it’s much more likely (even if open source) that it will just die on the vine, not being able to compile against more modern libraries and binaries themselves will just end up segfaulting. Use the source luke, comes to mind here where I could technically try to hire someone to fix it for me (or learn to code myself) but it’s not that important – I wish product X would still work and there isn’t anything realistically I can do to make it work.

But even if the application(or game or whatever) is old and not being maintained anymore it still may be useful to people. Microsoft has obviously done a really good job in this department over the years. I was honestly pretty surprised when I was able to play the game Xwing vs Tie Fighter(1997) on my dual processor Opteron with XP Professional (and reports say it works fine in Windows 7 provided you install it using another OS because the installer is 16-bit which doesn’t work in Windows 7 64-bit). I very well could be wrong but 1997 may of been even before Linux moved from libc5 to glibc.

I had been quietly hoping that as time has gone on that at some point things would stabilize as being good enough for some of these interfaces but it doesn’t seem to be happening. One thing that has seemed to have stabilize is the use of iptables as the firewall of choice on Linux. I of course went through ipfwadm in kernel 2.0, and ipchains in 2.2, then by the time iptables came out I had basically moved on to FreeBSD for my firewalls (later OpenBSD when pf came out). I still find iptables quite a mess compared to pf but about the most complicated thing I have to do with it is transparent port redirection and for that I just copy/paste examples of config I have from older systems. Doesn’t bug me if I don’t end up using it.

Another piece of software that I had grown to like over the years – this time something that really has been open source is xmms (version 1). Basically a lookalike of the popular Winamp software xmms v1 is a really nice simple MP3/OGG player. I even used it in it’s original binary-only incarnation. Version 1 was abandoned years ago(They list Red Hat 9 binaries if that gives you an idea), and version 2 seems to be absolutely nothing remotely similar to version 1. So I’ve tried to stick to version 1. With today’s screen resolutions I like to keep it in double size mode. Here’s a bug report on Debian from 2005 to give you an idea how old this issue is, but fortunately the workaround still works. Xmms still does compile(though I did need to jump through quite a few hoops if I recall right) – for how long I don’t know.

I remember a few months ago wanting to hook up my old Sling boxes again, which are used to stream TV over the internet (especially since I was going to be doing some traveling late last year/this year). I bought them probably 6-7-8 years ago and have not had them hooked up in years. Back then I was able to happily use WINE to install the Windows based Sling Player and watch video. This was in 2006. I tried earlier this year and it doesn’t work anymore. The same version of Sling Player (same .exe from 5+ years ago) doesn’t work on today’s WINE. I wasn’t the only one, a lot of other people had problems too(could not find any reports of it working for anyone). Of course it still worked in XP. I keep the Sling box turned off so it doesn’t wear out prematurely unless I plan to use it. Of course I forgot to plug it in before I went on my recent trip to Amsterdam.

I look at a stack of old Linux games from Loki Software and am quite certain none of them will ever run again, but the windows versions of such games will still happily run(some of them even run in Wine of all things). It’s disappointing to say the least.

I’m sure I am more committed to Linux on the desktop/laptop than most Linux folks out there (that are more often than not using OS X), and I don’t plan to change – just keep chugging along. From the early days of staying up all night compiling KDE 0.something on Slackware to what I have in Ubuntu today..

I’m grateful that Nvidia has been able to put out such quality drivers for Linux over the years and as a result I opt for their chipsets in my Linux laptop/desktops at every opportunity. I’ve been running it (Linux) since I want to say 1998 when my patience with NT4 finally ran out. Linux was the first system I was exposed to at a desktop level that didn’t seem to slow down or become less stable with the more software you loaded on it (stays true for me today as well). I never quite understood what I was doing, or what the OS was doing that would prompt me to re-install from the ground up at least once a year back in the mid 90s with Windows.

I don’t see myself ever going to OS X, I gave it an honest run for about two weeks a couple years ago and it was just so different to what I’m used to I could not continue using it, even putting Ubuntu as the base OS on the hardware didn’t work because I couldn’t stand the track pad (I like the nipple, who wouldn’t like a nipple? My current laptop has both and I always use the nipple) and the keyboard had a bunch of missing keys. I’m sure if I tried to forget all of my habits that I have developed over the years and do things the Apple way it could of worked but going and buying a Toshiba and putting Ubuntu 10.04 on it was (and remains) the path of least resistance for me to becoming productive on a new system (second least(next to Linux) resistance is a customized XP).

I did use Windows as my desktop at work for many years but it was heavily, heavily customized with Blackbox for windows as well as cygwin and other tools. So much so that the IT departments didn’t know how to use my system(no explorer shell, no start menu). But it gave windows a familiar feel to Linux with mouse over activation (XP Power toys – another feature OS X lacked outside of the terminal emulator anyways), virtual desktops (though no edge flipping). It took some time to configure but once up and going it worked well. I don’t know how well it would work in Windows 7, the version of BB I was using came out in 2004/2005 time frame, there are newer versions though.

I do fear what may be coming down the pike from a Linux UI perspective though, I plan to stick to Ubuntu 10.04 for as long as I can. The combination of Gnome 2 + some software called brightside which allows for edge flipping(otherwise I’d be in KDE) works pretty well for me(even though I have to manually start brightside every time I login, when it starts automatically it doesn’t work for some reason. The virtual desktop implementation isn’t as good as Afterstep, something I used for many years but Gnome makes up for it in other areas where Afterstep fell short.

I’ve gotten even more off topic than I thought I would.

So – thanks Nvidia for making such good drivers over the years, because of them it’s made Linux on the desktop/laptop that much easier to deal with for me. The only annoying issue I recall having was on my M5 laptop, which wasn’t limited to Linux and didn’t seem specific to Nvidia (or Toshiba).

Also thank you to Linus for making Linux and getting it to where it is today.

June 15, 2012

Life extending IPv4 Patent issued

Filed under: General — Tags: — Nate @ 8:23 am

I’ve worked at a few companies over the years, more than one of them have crashed and burned but one in particular has managed to hold on, against the odds. Don’t as me how because I don’t know. Their stock price was $0.26 more than a decade ago (I recall my opening option price was in the $5 range back in the Summer of 2000).

CNBC Screen shot from a long time ago, it must of been good to be a treasury investor then look at that yield!

They didn’t get into the patent game until years after they laid my friends and I off in 2002. The primary business model they had at the time, was making thin client software their main competition was the likes of Citrix and stuff. I worked on the side that made the Unix/Linux variant being a IT support/system admin for various Linux, Solaris, HPUX, Tru64 and AIX systems along with a few select Windows NT boxes. One of my long time friends worked on the Windows side of things at the other end of the country. The company acquired that technology from Corel I believe in the late 90s and still develop that product today. I’m not sure whether or not the Unix product is developed still for a long time they just had a single developer on it.

Anyways I write this because I was browsing their site yesterday while talking to that friend on the phone, and it turns out they were granted a groundbreaking new patent for cloud computing a couple of months ago. Though I think you’ll agree that it’s much more applicable towards extending the life of IPv4, without this technology we would of ran out of IPs a while ago.

U.S. Patent 8,117,298, issued February 14, 2012, is directed towards an easily configurable Web server that has the capability to host (or provide a home for) multiple Web sites at the same time. This patent helps small companies or individuals create the same kind of Web presence enjoyed by larger companies that are able to afford the cost of multiple, dedicated Web server machines.

This patent protects GraphOn’s unique technology for “configuring” a Web server that has the capability to host multiple Web sites at the same time – called “multi-homing”. This multi-homing capability of the Web server provides the appearance to users of multiple distinct and independent servers at a lower cost.
Functionally, a multi-homed Web server consists of, in effect, multiple virtual Web servers running on the same computer. The patent claims a design in which features can be readily added to one or more of the virtual servers. For example, a new software module having additional features or different capabilities might be substituted for an existing module on one of the virtual servers. The new features or capabilities may be added without affecting any other of the virtual servers and without the need to rebuild the Web server.

You can see the uses for this patent right? I mean pretty much everyone out there will immediately want to step in and license it because it really is groundbreaking.

Another thing I learned from the patent itself which I was not aware of is that most web servers run under inetd –

Webs servers, most of which are written for UNIX, often run under INETD (“eye-net-D”), an Internet services daemon in UNIX. (A daemon is a UNIX process that remains memory resident and causes specified actions to be taken upon the occurrence of specified events.) Typically, if more than sixty connection requests occur within a minute, INETD will shut down service for a period of time, usually five minutes, during which service remains entirely unavailable.

This is not the first ground breaking patent they’ve been issued over the years.

Back in 2008 they were issued

  • Patent 7,386,880 for some form of load balancing.
  • Patent 7,424,737 for some sort of bridging technology that converts between IP and non IP protocols (the example given is Satellite protocols).
  • Patent 7,360,244 for two-factor authentication against a firewall.

Back in 2007 they were issued

  • Patent 7,269,591 which talks about a useful business model where you can charge a fee to host someone’s personal web site
  • Patent 7,269,847 which is a very innovative technology involving configuration of firewalls using standard web browsers.
  • Patent 7,249,376 which covers multi homed firewalls and dynamic DNS updates
  • Patent 7,249,378 which seems to be associating a dedicated DNS for users utilizing a VPN.

Unfortunately for Graphon they did not license the patent that allows them to display more than the most recent five years of press releases on their site so I tasked our investigative sasquatch to find the information I require to finish this post.  Most/all of the below patents were acquired with the acquisition of Network Engineering Software.

Harry, our investigative sasquatch

Back in 2006 they were issued

  • Patent 7,028,034 covers web sites that dynamically update and pull information from a database to do so. Fortunately for you non-profits out there the scope is limited to pay sites.

..in 2005

  • Patent 6,850,940 which I’m honestly surprised someone like Oracle didn’t think of earlier, it makes a lot of sense. This covers maintaining a network accessible database that can receive queries from remote clients.

..And Waaay back in 2001

  • Patent 6,324,528 which covers something along the lines of measuring the amount of time a service is in use – I think this would be useful for the cloud companies too you need to be able to measure how much your users are using the service to really bill based on usage rather than what is provisioned.

I suppose I should of bought my options when I had the chance I mean if I had invested in the $5 option price I would be able to retire, well maybe not, given the stock is trading in the $0.12 range. I felt compelled to get this news out again so that the investors can wise up and see what an incredible IP portfolio this company has and the rest of the world needs to stand ready to license these key technologies.

Then I can go back in time and tell myself to buy those options only to come forwards in time and retire early. I’ll publish my time travel documents soon, I won’t ask you to license them from me, but you will have to stop at my toll both in the year 2009 to pay the toll. You’re free to travel between now and Jan 1 2009 without any fees though, think of it as a free trial.

June 14, 2012

Nokia’s dark future

Filed under: Random Thought — Tags: — Nate @ 8:00 am

Nokia made some headlines today, chopping a bunch of jobs, closing factories and stuff. With sales crashing and market share continuing to slip, their reliance on Windows Phone has been a bet that has gone bad, at least so far.

What I find more interesting though is what Microsoft has gotten Nokia to inflict upon itself. It’s basically investing much of it’s remaining resources to turn into a Microsoft shop. Meanwhile their revenues decline, and their market valuation plunges. There was apparently talks last year about Microsoft buying Nokia outright, but they fell through. For good reason, I mean all Microsoft has to do is wait, Nokia is doing their bidding already, and making the valuation of the company even less as time goes on. From a brand name standpoint Nokia doesn’t exist in the smart phone world (really), so there really isn’t much to lose (other than the really good employees that may be jumping ship in the meantime – though I’m sure Nokia keeps MS aware of who is leaving so MS can contact them in the event they want to try to hire them back).

At some point barring a miracle, Nokia will get acquired. By so heavily investing itself in Microsoft technologies now, and until that acquisition happens they are naturally preparing themselves for assimilation – and at the same time making themselves less attractive to most other buyers because they are so committed to the Microsoft platform. Other buyers may come in and say we want to buy the patents or this piece or that piece. But then Microsoft can come in and offer a much higher price because all of the other parts of the company have much more value to them.

Not that I think going the Microsoft way was a mistake. All too often I see people say all Nokia had to do is embrace Android and they’d be fine. I don’t agree at all here. Look at the Android market place, there are a very few select standouts, Samsung (Apple and Samsung receive 90%+ of the mobile phone profits) being the main one these days (though I believe as recently as perhaps one year ago it was HTC though they have fallen from grace as well). There’s not enough to differentiate in the Android world, there are tons of handset makers, most of them are absolute crap(very cheap components, breaks easily, names you’ve never heard of), the tablets aren’t much better.

So the point here is just being another me too supplier of Android wasn’t going to cut it. To support an organization that large they needed something more extraordinary. Of course that is really hard to come up with, so they went to Microsoft. It’s too bad that Nokia, like RIM and even Palm(despite me being a WebOS fan and user, the WebOS products were the only Palm-branded products I have ever owned) floundered so long before they came up with a real strategy.

HP obviously realized this as well given the HP Touchpad was originally supposed to run Android – before the Palm acquisition. Which would explain the random Touchpad showing up (from RMA) in customer’s hands running Android.

Palm’s time of course prematurely ran out last year (HP’s original plan had a three year runway for Palm), Nokia and RIM still have a decent amount of cash on hand and it remains to be seen if they have enough time to execute on their plans. I suspect they won’t, with Nokia ending up at Microsoft, and RIM I don’t know. I think it would make another good MS fit primarily for the enterprise subscribers, though by the time the valuation is good enough (keeping in mind MS will acquire Nokia) there may not be enough of them left. Unless RIM breaks apart, sells the enterprise biz to someone like MS, and maintains a smaller global organization supporting users where they still have a lot of growth which seems to be in emerging markets.

Of course Nokia is not the only one making Windows Phone handsets, but at least that market is still so new (at least with the latest platform) that there was a better opportunity for them to stand out amongst the other players.

Speaking of the downfall of Nokia and RIM, there was a fascinating blog post a while back about the decline of Apple since the founder is gone now. It generated a ton of buzz, I think the person makes a lot of very good and valid points.

Now that I’ve written that maybe my mind can move on to something else.

Older Posts »

Powered by WordPress