Mar/108
The Smooth F5 Big-IP LTM Upgrade That Wasn’t
TechOps Guy: Tycen
A few weeks ago I attended an F5/VMware/Dell luncheon (where Dell failed to show up, something about a prelim ship date of 3 weeks out). After the event I talked to a couple of F5 engineers and asked them about upgrading our Big-IP LTM 3600’s from 9.4.7 to their latest 10.1.0. We have a redundant pair of 3600’s in active/standby mode. According to them, it was as easy as upgrading the standby node, failing over, and then upgrading the other node. We have a pretty basic config, not a lot of nodes/pools/virtual servers and no add-on modules (at this time). We used the default partitioning. Easy as pie.
I followed this F5 guide which for me basically boiled down to these steps:
- # mkdir /shared/images
- copy ISO to /shared/images
- # cd /shared/images
- # im – This copies over the image2disk utility, and then presents a status message, which lets you know that the im command is nolonger supported, and tells you how to proceed
- # image2disk –instslot=HD1.2 –format=volumes
- # switchboot -b HD1.2
- reboot
The trouble started with step 5 above. It gave me an error that I needed to re-activate my keys. Not a big problem, but still made me nervous since I had a narrow window to do this upgrade in. But, re-activation was easy through the web interface (System > License > Re-activate).
The next issue was more scary – after I re-activated I re-issued the command in step 5 and the Big-IP reboots automatically (no mention of this in the upgrade doc linked to above). And it takes FOREVER to reboot. I’m sure it’s doing a lot of really tricky stuff (reformatting and upgrading), but still it’s an anxious wait. For me it was about 12 minutes (the linked upgrade guide says between 3 and 7 minutes). I was just about to put my shoes on and head to the datacenter (30 miles away) when the pings started responding.
This is where things got ugly. When the newly updgraded node came back online, it took over and became the ACTIVE node! I was just barely getting logged into it when my internal monitoring reported that the load balancer had failed over. And, that wouldn’t have been too bad because of course I had done a config sync before I started this whole process, expect that the now active node couldn’t load it’s config (more on that below). It was sitting there with a blank config (it had the correct self IPs and HA config) and users were getting nothing, not even a maintenance page. So, I forced it to standby so the other node could at least serve a maintenance page while I figured out why it wasn’t loading the config.
The bigip.conf file was there and looked intact. I can’t remember now what pointed me in the right direction (maybe while doing a b load, but I finally figured out that it was missing some class files in /var/class. I had previously used Jason Rahm’s maintenance page generator script which creates some class files used for hosting a maintenance page. Apparently the upgrade wiped out those files and the config wouldn’t load without them. (sidenote: the iRule generated by that script isn’t compatible with 10.x – but there is a new version of the script – v2 – that detects what code you’re running and builds the iRule accordingly – I have yet to use it to generate a new maintenance page and iRule). I rsync’d the class files from the other Big-IP and that allowed the config to load. I was then able to fail back to the Big-IP with the 10.x code and it seems to be working fine. Now I just need to update the other node and pray it doesn’t try to take over after reboot.
The first node I updated was set as the preferred active node (System > High Availability > Redundancy), so maybe that’s why it took over after the upgrade/reboot. But, that would be a bug in my opinion since the other node was healthy and active. Setting this to “None” might have kept the unwanted failover from happening, but I’m not going to downgrade and find out.
Another (minor) annoying thing was that the SSH authorized_keys were wiped out, so some monitoring scripts I had set up didn’t work until I added the monitoring host’s key back in to the authorized_keys file.
One final thing, I did not need to do step 6. Running the switchboot command w/o any arguments shows that HD1.2 is the default and only boot image. And, as I detailed above, the reboot in step 7 was done for me – whether I was ready for it or not.
All in all, it was not a smooth upgrade. But, I’m sure there are a lot worse things that could have happened. And, hey, at least now 10.x has vim!
March 5th, 2010
I love your posts… I understand some of the words (it, the, I) and I recognized the rest as english, but beyond that it’s all just mumbo jumbo. My dumb factor just increased 120%.
March 5th, 2010
You should configure the AOM/SCCP
https://support.f5.com/kb/en-us/solutions/public/3000/700/sol3753.html
Think of it as a DRAC/iLO for the F5, you can watch it boot/etc over the network. There is a 2nd computer in that chassis that has it’s own MAC address on the management port. So you’d have 2 different IPs on the management port, one for each computer. Not a well known feature
March 5th, 2010
here’s the link for AOM (newer models)
https://support.f5.com/kb/en-us/solutions/public/9000/600/sol9608.html
March 5th, 2010
Yes they just never tell you about updating the license before the upgrade. We just have to find out the hard way.
May 5th, 2010
Hey, thanks a million for the detailed cautions!!! We are looking at moving our F5 load balancers from 9.3.1 to 10.1.0 in the next 2-3 months and this sort of information is massively helpful.
How much of the configuration file did you have to change?
May 5th, 2010
No changes needed to be made to the config file for us – just the issue with the stuff in /var/class (which didn’t technically need a config change). Good luck with your upgrade!
July 8th, 2010
The release notes have install/upgrade instructions, including license reactivation. They mention the reboot, albeit obliquely:
https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/product/relnote_10_1_0_ltm.html
I always do major network upgrades on site. The console port lets you know what is going on.
September 1st, 2010
Hi, I have tried to upgrade Big-IP 1600 from 9.4.8 to 10.2 but end up with Big-IP not boot. What should I do to reverse it back to default? Please help me.