ACE software upgrade
Cisco Application Control Engine Module (ACE) load-balancers are designed to work in standalone mode or in cluster mode. When running in standalone mode, software upgrade has obviously a great impact on the traffic going through the load-balancer. All the sessions will be dropped and no new session will be accepted until the ACE restarts with the new image (up to 8 minutes).
Now, in cluster mode, you can do the software upgrade with no or very limited impact if you follow the correct sequence of operations. Here are the steps I used last time and it went perfectly and transparent for the users.
Note this procedure has been tested on ACE modules for Catalyst 6500 only but it should stay valid for the ACE 4710 appliances.
Step 1
First you need to make sure all the contexts are properly synchronized and the standby contexts are in STANDBY_HOT state.
ACE_1/Admin# sh ft group brief FT Group ID: 1 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_HOT Context Name: Admin Context Id: 0 FT Group ID: 2 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_COLD Context Name: C1 Context Id: 4 FT Group ID: 3 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_HOT Context Name: C2 Context Id: 3
Here as you can see context C1 is stuck in STANDBY_COLD state. Usually put that context out of service on the standby ACE and then put it back in service solve the issue. If it is not the case you won’t have a fully transparent software upgrade for that context; current session will be dropped but new session will be accepted after the failover. If it is acceptable for you, go on with the upgrade otherwise try to find out why it is not in STANDBY_HOT state.
Note it might take several minutes to leave the STANDBY_BULK state (it took 2 minutes during my tests).
ACE_2/Admin(config)# ft group 2 ACE_2/Admin(config-ft-group)# no inservice ACE_2/Admin(config-ft-group)# do sh ft group 2 detail FT Group : 2 No. of Contexts : 1 Context Name : C1 Context Id : 4 Configured Status : out-of-service Maintenance mode : MAINT_MODE_OFF My State : FSM_FT_STATE_INIT My Config Priority : 90 My Net Priority : 90 My Preempt : Enabled Peer State : FSM_FT_STATE_UNKNOWN Peer Config Priority : Unknown Peer Net Priority : Unknown Peer Preempt : Unknown Peer Id : 1 Last State Change time : Wed Feb 3 14:35:36 2010 Running cfg sync enabled : Enabled Running cfg sync status : Startup cfg sync enabled : Enabled Startup cfg sync status : Bulk sync done for ARP: 0 Bulk sync done for LB: 0 Bulk sync done for ICM: 0 ACE_2/Admin(config-ft-group)# inservice NOTE: Configuration mode has been disabled on all sessions ACE_2/Admin(config-ft-group)# do sh ft group 2 detail FT Group : 2 No. of Contexts : 1 Context Name : C1 Context Id : 4 Configured Status : in-service Maintenance mode : MAINT_MODE_OFF My State : FSM_FT_STATE_STANDBY_BULK My Config Priority : 90 My Net Priority : 90 My Preempt : Enabled Peer State : FSM_FT_STATE_ACTIVE Peer Config Priority : 120 Peer Net Priority : 120 Peer Preempt : Enabled Peer Id : 1 Last State Change time : Wed Feb 3 14:36:02 2010 Running cfg sync enabled : Enabled Running cfg sync status : Running configuration sync has completed Startup cfg sync enabled : Enabled Startup cfg sync status : Startup configuration sync has completed Bulk sync done for ARP: 1 Bulk sync done for LB: 0 Bulk sync done for ICM: 0 ACE_2/Admin(config-ft-group)# do sh ft group 1 detail FT Group : 2 No. of Contexts : 1 Context Name : C1 Context Id : 4 Configured Status : in-service Maintenance mode : MAINT_MODE_OFF My State : FSM_FT_STATE_STANDBY_HOT My Config Priority : 90 My Net Priority : 90 My Preempt : Enabled Peer State : FSM_FT_STATE_ACTIVE Peer Config Priority : 120 Peer Net Priority : 120 Peer Preempt : Enabled Peer Id : 1 Last State Change time : Wed Feb 3 14:37:51 2010 Running cfg sync enabled : Enabled Running cfg sync status : Running configuration sync has completed Startup cfg sync enabled : Enabled Startup cfg sync status : Startup configuration sync has completed Bulk sync done for ARP: 1 Bulk sync done for LB: 2 Bulk sync done for ICM: 2
Step 2
On the ACE, preemption is enabled by default for all the contexts. It needs to be disabled to perform a manual failover.
ACE_1/Admin(config)# ft group 1 ACE_1/Admin(config-ft-group)# no preempt ACE_1/Admin(config-ft-group)# ft group 2 ACE_1/Admin(config-ft-group)# no preempt ACE_1/Admin(config-ft-group)# ft group 3 ACE_1/Admin(config-ft-group)# no preempt ACE_1/Admin(config-ft-group)# end
Step 3
Download the new software image to the active and standby ACEs. Here I’ve chosen to use tftp because I hadn’t a ftp server configured in the lab… ftp can be used and is definitely faster.
ACE_1/Admin# copy tftp: image: Enter source filename[]? c6ace-t1k9-mz.A2_2_3.bin Enter the destination filename[]? [c6ace-t1k9-mz.A2_2_3.bin] Address of remote host[]? 10.1.1.1 Trying to connect to tftp server...... !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (…) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! TFTP get operation was successful 31361516 bytes copied ACE_1/Admin# ACE_1/Admin# dir image: 30788103 Apr 15 13:14:48 2009 c6ace-t1k9-mz.A2_1_4a.bin 31361516 Feb 3 14:43:45 2010 c6ace-t1k9-mz.A2_2_3.bin Usage for image: filesystem 461848576 bytes total used 577126400 bytes free 1038974976 total bytes
Check the file size is correct…
Step 4
Change the boot string on the active ACE, it will be synced to the standby ACE. By the way, configuration mode is disabled on the standby ACE therefore it is the only option…
ACE_1/Admin# sh run | i boot Generating configuration.... boot system image:c6ace-t1k9-mz.A2_1_4a.bin ACE_1/Admin# conf t Enter configuration commands, one per line. End with CNTL/Z. ACE_1/Admin(config)# no boot system image:c6ace-t1k9-mz.A2_1_4a.bin ACE_1/Admin(config)# boot system image:c6ace-t1k9-mz.A2_2_3.bin ACE_1/Admin(config)# exit ACE_1/Admin# wr mem all Generating configuration.... running config of context Admin saved Generating configuration.... running config of context C2 saved Generating configuration.... running config of context C1 saved Please wait ... sync to compact flash in progress. This may take a few minutes to complete Sync Done
Step 5 (optional)
Create checkpoint in all contexts on active and standby devices
ACE_2/Admin# checkpoint create 20100203 Generating configuration.... Created configuration checkpoint '20100203' ACE_2/Admin# changeto C2 NOTE: Configuration mode has been disabled on all sessions ACE_2/C2# checkpoint create 20100203 Generating configuration.... Created configuration checkpoint '20100203' ACE_2/C2# changeto C1 NOTE: Configuration mode has been disabled on all sessions ACE_2/C1# checkpoint create 20100203 Generating configuration.... Created configuration checkpoint '20100203' ACE_2/C1# changeto Admin
Step 6
Reload the standby device
ACE_2/Admin# reload This command will reboot the system Save configurations for all the contexts. Save? [yes/no]: [yes] no (already done in step 4) Perform system reload. [yes/no]: [yes] NOTE: Configuration mode is enabled on all sessions Connection to ACE_2 closed by remote host. Connection to ACE_2 closed.
Step 7
Check the standby device is running the new software version.
ACE_2/Admin# sh ver Cisco Application Control Software (ACSW) TAC support: http://C2 .cisco.com/tac Copyright (c) 2002-2009, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained herein are owned by other third parties and are used and distributed under license. Some parts of this software are covered under the GNU Public License. A copy of the license is available at http://C2 .gnu.org/licenses/gpl.html. Software loader: Version 12.2[120] system: Version A2(2.3) [build 3.0(0)A2(2.3)] system image file: [LCP] disk0:c6ace-t1k9-mz.A2_2_3.bin installed license: ACE-VIRT-020 Hardware Cisco ACE (slot: 6) cpu info: number of cpu(s): 2 cpu type: SiByte cpu: 0, model: SiByte SB1 V0.2, speed: 700 MHz cpu: 1, model: SiByte SB1 V0.2, speed: 700 MHz memory info: total: 827128 kB, free: 256000 kB shared: 0 kB, buffers: 1824 kB, cached 0 kB cf info: filesystem: /dev/cf total: 1014624 kB, used: 451040 kB, available: 563584 kB last boot reason: reload command by Admin configuration register: 0x1 ACE_2 kernel uptime is 0 days 0 hour 8 minute(s) 45 second(s)
Step 8
Wait until all the contexts on the standby devices stabilize in STANDBY_WARM or STANDBY_HOT state.
ACE_2/Admin# sh ft group brief FT Group ID: 1 My State:FSM_FT_STATE_STANDBY_WARM Peer State:FSM_FT_STATE_ACTIVE Context Name: Admin Context Id: 0 FT Group ID: 2 My State:FSM_FT_STATE_STANDBY_WARM Peer State:FSM_FT_STATE_ACTIVE Context Name: C1 Context Id: 4 FT Group ID: 3 My State:FSM_FT_STATE_STANDBY_WARM Peer State:FSM_FT_STATE_ACTIVE Context Name: C2 Context Id: 3
For your information, here is what Cisco says about STANDBY_WARM state :
In the STANDBY_WARM state, as with the STANDBY_HOT state, configuration mode is disabled on the standby ACE and configuration and state synchronization continues. A failover from the active to the standby based on priorities and preempt can still occur while the standby is in the STANDBY_WARM state. However, while stateful failover is possible for a WARM standby, it is not guaranteed. In general, modules should be allowed to remain in this state only for a short period.
Step 9
Perform a failover from the active ACE to the standby ACE for all the contexts.
ACE_1/Admin# ft switchover all This command will cause card to switchover (yes/no)? [no] yes NOTE: Configuration mode has been disabled on all sessions
Step 10
Check the newly upgraded ACE is well become active.
ACE_1/Admin# sh ft group brief FT Group ID: 1 My State:FSM_FT_STATE_STANDBY_BULK Peer State:FSM_FT_STATE_ACTIVE Context Name: Admin Context Id: 0 FT Group ID: 2 My State:FSM_FT_STATE_STANDBY_BULK Peer State:FSM_FT_STATE_ACTIVE Context Name: C1 Context Id: 4 FT Group ID: 3 My State:FSM_FT_STATE_STANDBY_BULK Peer State:FSM_FT_STATE_ACTIVE Context Name: C2 Context Id: 3
Step 11
Reload the 2nd ACE (previously active).
ACE_1/Admin# reload This command will reboot the system Save configurations for all the contexts. Save? [yes/no]: [yes] no Perform system reload. [yes/no]: [yes] NOTE: Configuration mode is enabled on all sessions Connection to ACE_1 closed by remote host. Connection to ACE_1 closed.
Step 12
When the 2nd ACE state stabilize to FSM_FT_STATE_STANDBY_HOT state, perform again a failover for all the contexts.
ACE_2/Admin# sh ft group brief FT Group ID: 1 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_HOT Context Name: Admin Context Id: 0 FT Group ID: 2 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_HOT Context Name: C1 Context Id: 4 FT Group ID: 3 My State:FSM_FT_STATE_ACTIVE Peer State:FSM_FT_STATE_STANDBY_HOT Context Name: C2 Context Id: 3
Step 13 (If you’re not superstitious)
Reconfigure preemption if it is in your standard… (personally I don’t like preemption because if a device has failed I prefer to check exactly why before activating it again)
ACE_1/Admin(config)# ft group 1 ACE_1/Admin(config-ft-group)# preempt ACE_1/Admin(config-ft-group)# ft group 2 ACE_1/Admin(config-ft-group)# preempt ACE_1/Admin(config-ft-group)# ft group 3 ACE_1/Admin(config-ft-group)# preempt ACE_1/Admin(config-ft-group)# end ACE_1/Admin# wr mem
And that’s it, you have upgraded your ACE cluster with no or limited impact. If you find this post helpful you may leave a comment to encourage me to publish more articles…