Gateway Load Balancing Protocol



Cisco Gateway Load Balancing Protocol (GLBP) improves the efficiency of FHRP protocols by allowing for automatic load balancing of the default gateway. The advantage of GLBP is that it additionally provides load balancing over multiple routers (gateways) using a single virtual IP address and multiple virtual MAC addresses per GLBP group. (In contrast, both HRSP and VRRP used only one virtual MAC address per HSRP/VRRP group.) The forwarding load is shared among all routers in a GLBP group rather than handled by a single router while the other routers stand idle. Each host is configured with the same virtual IP address, and all routers in the virtual router group participate in forwarding packets.
Members of a GLBP group elect one gateway to be the active virtual gateway (AVG) for that group. Other group members provide backup for the AVG if that AVG becomes unavailable. The function of the AVG is that it assigns a virtual MAC address to each member of the GLBP group. Each gateway assumes responsibility for forwarding packets sent to the virtual MAC address assigned to it by the AVG. These gateways are known as active virtual forwarders (AVF) for their virtual MAC address.
The AVG is also responsible for answering Address Resolution Protocol (ARP) requests for the virtual IP address. Load sharing is achieved by the AVG replying to the ARP requests with different virtual MAC addresses (corresponding to each gateway router).
In Figure 1, Router A is the AVG for a GLBP group and is primarily responsible for the virtual IP address 172.16.128.3; however, Router A is also an AVF for the virtual MAC address 0007.b400.0101. Router B is a member of the same GLBP group and is designated as the AVF for the virtual MAC address 0007.b400.0102. All hosts have their default gateway IP addresses set to the virtual IP address of 172.16.128.3. However, when these are an ARP for the MAC of this virtual IP address, Host A and Host C receives a gateway MAC address of 0007.b400.0101 (directing these hosts to use Router A as their default gateway), but Host B and Host D receive a gateway MAC address 0007.b400.0102 (directing these hosts to use Router B as their default gateway). In this way the gateway routers automatically load share.

 
Figure 1: GLBP topology
If Router A becomes unavailable, Hosts A and C do not lose access to the WAN because Router B assumes responsibility for forwarding packets sent to the virtual MAC address of Router A and for responding to packets sent to its own virtual MAC address. Router B also assumes the role of the AVG for the entire GLBP group. Communication for the GLBP members continues despite the failure of a router in the GLBP group.
Additionally, like HSRP and VRRP, GLBP supports object tracking and preemption and SSO awareness.
Note 
SSO awareness for GLBP is enabled by default when the route processor’s redundancy mode of operation is set to SSO (as was shown in the “NSF with SSO” section of this chapter).
However, unlike the object tracking logic used by HSRP and VRRP, GLBP uses a weighting scheme to determine the forwarding capacity of each router in the GLBP group. The weighting assigned to a router in the GLBP group can be used to determine whether it forwards packets and, if so, the proportion of hosts in the LAN for which it forwards packets. Thresholds can be set such that when the weighting for a GLBP group falls below a certain value, and when it rises above another threshold, forwarding is automatically reenabled.
The GLBP group weighting can be automatically adjusted by tracking the state of an interface within the router. If a tracked interface goes down, the GLBP group weighting is reduced by a specified value. Different interfaces can be tracked to decrement the GLBP weighting by varying amounts.
Example 1 shows a GLBP configuration that can be used on the LAN interface of the AVG from Figure 1. Each GLBP group on a given subnet requires a unique number; in this example the GLBP group number is set to 10. The virtual IP address for the GLBP group is set to 172.16.128.3. The GLBP priority of this interface has been set to 105, and like HSRP, preemption for GLBP must be explicitly enabled (if desired). Finally, object tracking has been configured so that should the line protocol state of interface Serial0/1 go down (the WAN link for this router, which is designated as object-number 110), the GLBP priority for this interface dynamically decrements (by a value of 10, by default).
Example 1: GLBP Example

Router(config)# track 110 interface Serial0/1 line-protocol
Router(config)# interface GigabitEthernet0/0
Router(config-if)# ip address 172.16.128.1 255.255.255.0
Router(config-if)# glbp 10 ip 172.16.128.3
Router(config-if)# glbp 10 priority 105
Router(config-if)# glbp 10 preempt
Router(config-if)# glbp 10 weighting track 110

Virtual Router Redundancy Protocol | L3 Network Availability Protocols



The Virtual Router Redundancy Protocol (VRRP), defined in RFC 2338, is a FHRP that is similar to HSRP but capable of supporting multivendor environments. A VRRP router is configured to run the VRRP protocol in conjunction with one or more other routers attached to a LAN. In a VRRP configuration, one router is elected as the virtual router master, with the other routers acting as backups if the virtual router master fails.
VRRP enables a group of routers to form a single virtual router. The LAN clients can then be configured with the virtual router as their default gateway. The virtual router, representing a group of routers, is also known as a VRRP group.
Figure 1 shows a LAN topology with VRRP configured. In this example, two VRRP routers (routers running VRRP) comprise a virtual router. However, unlike HSRP, the IP address of the virtual router is the same as that configured for the LAN interface of the virtual router master, in this example 172.16.128.1.

 
Figure 1: VRRP topology.
Router A assumes the role of the virtual router master and is also known as the IP address owner because the IP address of the virtual router belongs to it. As the virtual router master, Router A is responsible for forwarding packets sent to this IP address. Each IP host on the subnet is configured with the default gateway IP address of the virtual route master, in this case 172.16.128.1.
Router B, on the other hand, functions as a virtual router backup. If the virtual router master fails, the router configured with the higher priority becomes the virtual router master and provides uninterrupted service for the LAN hosts. When Router A recovers, it becomes the virtual router master again.
Additionally, like HSRP, VRRP supports object tracking and preemption and SSO awareness.
Note 
SSO awareness for VRRP is enabled by default when the route processor’s redundancy mode of operation is set to SSO (as was shown in the “NSF with SSO” section of this chapter).
Example 1 shows a VRRP configuration that can be used on the LAN interface of the virtual router master from Figure 1. Each VRRP group on a given subnet requires a unique number; in this example the VRRP group number is set to 10. The virtual IP address is set to the actual LAN interface address, designating this router as the virtual router master. The VRRP priority of this router has been set to 105. Unlike HSRP, preemption for VRRP is enabled by default. Finally, object tracking has been configured so that should the line protocol state of interface Serial0/1 go down (the WAN link for this router, which is designated as object-number 110), the VRRP priority for this interface dynamically decrements (by a value of 10, by default).
Example 1: VRRP Example

Router(config)# track 110 interface Serial0/1 line-protocol
Router(config)# interface GigabitEthernet0/0
Router(config-if)# ip address 172.16.128.1 255.255.255.0
Router(config-if)# vrrp 10 ip 172.16.128.1
Router(config-if)# vrrp 10 priority 105
Router(config-if)# vrrp 10 track 110

A drawback to both HSRP and VRRP is that the standby/backup router is not used to forward traffic and wastes both available bandwidth and processing capabilities. This limitation can be worked around by provisioning two complementary HSRP/VRRP groups on each LAN subnet, with one group having the left router as the active/master and the other group having the right router as the active/master router. Then, approximately half of the hosts are configured to use the virtual IP address of one HSRP/VRRP group, and remaining hosts are configured to use the virtual IP address of the second group. Obviously, this requires additional operational and management complexity. To improve the efficiency of these FHRP models without such additional complexity

Hot Standby Router Protocol | L3 Network Availability Protocols



TelePresence codecs route all traffic through their default gateway router. If (for whatever reason) the default gateway router fails, TelePresence calls hang or self-terminate, depending on the amount of time it takes for the default gateway router to recover (or be replaced).
The Cisco Hot Standby Router Protocol (HSRP) is the first of three First-Hop Redundancy Protocols (FHRP) discussed in this chapter (the other two being VRRP and GLBP). An FHRP provides increased availability by allowing for transparent failover of the first-hop (or default gateway) router.
HSRP is used in a group of routers for selecting an active router and a standby router. In a group of router interfaces, the active router is the router of choice for routing packets; the standby router is the router that takes over when the active router fails or when preset conditions are met.
Endpoint devices, or IP hosts, have an IP address of a single router configured as the default gateway. When HSRP is used, the HSRP virtual IP address is configured as the host’s default gateway instead of the actual IP address of the router.
When HSRP is configured on a network segment, it provides a virtual MAC address and an IP address that is shared among a group of routers running HSRP. The address of this HSRP group is referred to as the virtual IP address. One of these devices is selected by the HSRP to be the active router. The active router receives and routes packets destined for the MAC address of the group.
HSRP detects when the designated active router fails; at which point, a selected standby router assumes control of the MAC and IP addresses of the Hot Standby group. A new standby router is also selected at that time.
HSRP uses a priority mechanism to determine which HSRP configured router is to be the default active router. To configure a router as the active router, you assign it a priority that is higher than the priority of all the other HSRP-configured routers. The default priority is 100, so if just one router is configured to have a higher priority, that router will be the default active router.
Devices that run HSRP send and receive multicast User Datagram Protocol (UDP)-based hello messages to detect router failure and to designate active and standby routers. When the active router fails to send a hello message within a configurable period of time, the standby router with the highest priority becomes the active router. The transition of packet forwarding functions between routers is completely transparent to all hosts on the network.
Multiple Hot Standby groups can be configured on an interface, thereby making fuller use of redundant routers and load sharing.
Figure 1 shows a network configured for HSRP. By sharing a virtual MAC address and IP address, two or more routers can act as a single virtual router. The virtual router does not physically exist but represents the common default gateway for routers that are configured to provide backup to each other. All IP hosts are configured with the IP address of the virtual router as their default gateway. If the active router fails to send a hello message within the configurable period of time, the standby router takes over and responds to the virtual addresses and becomes the active router, assuming the active router duties.

 
Figure 1: HSRP topology
HSRP also supports object tracking so that the HSRP priority of a router can dynamically change when an object that is tracked goes down. Examples of objects that can be tracked are the line protocol state of an interface or the reachability of an IP route. If the specified object goes down, the HSRP priority is reduced.
Furthermore, HSRP supports SSO awareness so that HRSP can alter its behavior when a router with redundant Route Processors (RP) is configured as SSO redundancy mode. When an RP is active and the other RP is standby, SSO enables the standby RP to take over if the active RP fails.
With this functionality, HSRP SSO information is synchronized to the standby RP, allowing traffic that is sent using the HSRP virtual IP address to be continuously forwarded during a switchover without a loss of data or a path change. Additionally, if both RPs fail on the active HSRP router, the standby HSRP router takes over as the active HSRP router.
Note 
SSO awareness for HSRP is enabled by default when the RP’s redundancy mode of operation is set to SSO (as shown in the “NSF with SSO” section of this chapter).
Example 1 demonstrates the HSRP configuration that you can use on the LAN interface of the active router from Figure 1. Each HSRP group on a given subnet requires a unique number; in this example the HSRP group number is set to 10. The virtual router’s IP address (which is what each IP host on the network uses as a default-gateway address) is set to 172.16.128.3. The HRSP priority of this router has been set to 105, and preemption has been enabled on it; preemption allows for the router to immediately take over as the virtual router (provided it has the highest priority on the segment). Finally, object tracking has been configured so that if the line protocol state of interface Serial0/1 goes down (the WAN link for the active router, which is designated as object-number 110), the HSRP priority for this interface dynamically decrements (by a value of 10, by default).
Example 1: HSRP Example

Router(config)# track 110 interface Serial0/1 line-protocol
Router(config)# interface GigabitEthernet0/0
Router(config-if)# ip address 172.16.128.1 255.255.255.0
Router(config-if)# standby 10 ip 172.16.128.3
Router(config-if)# standby 10 priority 105 preempt
Router(config-if)# standby 10 track 110

As HRSP was the first FHRP and because it was invented by Cisco, it is Cisco proprietary. However, to support multivendor interoperability, aspects of HSRP were standardized in the Virtual Router Redundancy Protocol

EtherChannels, Cisco Port Aggregation Protocol, and IEEE 802.3ad



Ethernet link speeds are standardized in factors of 10 (Ethernet, FastEthernet, GigabitEthernet, and Ten Gigabit Ethernet). When switch-to-switch links within TelePresence campus networks become congested, however, it might be costly to upgrade by a full factor of 10. It is generally more cost-effective to add another parallel link at the same speed; however, as more parallel links are added, these might become operationally complex to administer. Therefore, administration of multiple redundant links can be simplified through the use of EtherChannels.
EtherChannel technologies create a single logical link by bundling multiple physical Ethernet-based links (such as Gigabit Ethernet or Ten Gigabit Ethernet links) together, as shown in Figure 1. As such, EtherChannel links can provide for increased redundancy, capacity, and load-balancing. To optimize the load balancing of traffic over multiple links, it is recommended to deploy EtherChannels in powers of two (two, four, or eight) physical links. EtherChannel links can operate at either L2 or L3.

 
Figure 1: EtherChannel bundle
EtherChannel links can be created using Cisco Port Aggregation Protocol (PAgP), which performs a negotiation prior to forming a channel, to ensure compatibility and administrative policies.
You can configure PAgP in four channeling modes:
  • On: Forces the LAN port to channel unconditionally. In the on mode, a usable EtherChannel exists only when a LAN port group in the on mode is connected to another LAN port group in the on mode. Ports configured in the on mode do not negotiate to form EtherChannels: They just do or do not, depending on the other port’s configuration.
  • Off: Precludes the LAN port from channeling unconditionally.
  • Desirable: Places a LAN port into an active negotiating state in which the port initiates negotiations with other LAN ports to form an EtherChannel by sending PAgP packets. A port in this mode forms an EtherChannel with a peer port that is in either auto or desirable PAgP mode.
  • Auto: (Default) Places a LAN port into a passive negotiating state in which the port responds to PAgP packets it receives but does not initiate PAgP negotiation. A port in this mode forms an EtherChannel with a peer port that is in desirable PAgP mode (only).
PAgP, when enabled as an L2 link, is enabled on the physical interface (only). Optionally, you can change the PAgP mode from the default “auto” negotiation mode, as follows:
Switch(config)# interface GigabitEthernet8/1
Switch(config-if)# channel-protocol pagp
Switch(config-if)# channel-group 15 mode desirable
Alternatively, EtherChannels can be negotiated with the IEEE 802.3ad Link Aggregation Control Protocol (LACP), which similarly allows a switch to negotiate an automatic bundle by sending LACP packets to the peer. LACP supports two channel negotiation modes:
  • Active: Places a port into an active negotiating state in which the port initiates negotiations with other ports by sending LACP packets. A port in this mode forms a bundle with a peer port that is in either active or passive LACP mode.
  • Passive: (Default) Places a port into a passive negotiating state in which the port responds to LACP packets it receives but does not initiate LACP negotiation. A port in this mode forms a bundle with a peer port that is in active LACP mode (only).
Similar to PAgP, LACP requires only a single command on the physical interface when configured as a L2 link. Optionally, you can change the LACP mode from the default “passive” negotiation mode, as follows:
Switch(config)# interface GigabitEthernet8/2
Switch(config-if)# channel-protocol lacp
Switch(config-if)# channel-group 16 mode active
Note that PAgP and LACP do not interoperate with each other; ports configured to use PAgP cannot form EtherChannels with ports configured to use LACP, and can ports configured to use LACP cannot form EtherChannels with ports configured to use PAgP.
EtherChannel plays a critical role in provisioning network link redundancy, especially at the campus distribution and core layers. Furthermore, an evolution of EtherChannel technology plays a key role the Cisco Virtual Switching System

Trunks, Cisco Inter-Switch Link, and IEEE 802.1Q



TelePresence codecs are assigned to the Voice VLAN, whereas most endpoint devices operate within the Data VLAN. It would be inefficient, costly, and administratively complex to use dedicated Ethernet ports and cables for each VLAN. Therefore, a logical separation of VLANs over a physical link is more efficient, cost-effective, and simpler to administer.
trunk is a point-to-point link between two networking devices (switches and routers) capable of carrying traffic from multiple VLANs over a single link. VLAN frames are encapsulated with trunking protocols to preserve logical separation of traffic while transiting the trunk.
There are two trunking encapsulations available to Cisco devices:
  • Inter-Switch Link (ISL): A Cisco-proprietary trunking encapsulation
  • IEEE 802.1Q: An industry-standard trunking encapsulation and the trunking protocol used by TelePresence codecs
You can configure trunks on individual links or on EtherChannel bundles (discussed in the following section).
ISL encapsulates the original Ethernet frame with both a header and a Field Check Sequence (FCS) trailer, for a total of 30 bytes of encapsulation.
You can configure ISL trunking on a switch port interface, as demonstrated in Example 1. The trunking mode is set to ISL, and the VLANs permitted to traverse the trunk are explicitly identified. In this example VLANs 2 and 102 are permitted over the ISL trunk.
Example 1: ISL Trunk Example

Switch(config)#interface GigabitEthernet8/3
Switch(config-if)# switchport
Switch(config-if)# switchport trunk encapsulation isl
Switch(config-if)# switchport trunk allowed 2, 102

In contrast with ISL, 801.1Q doesn’t actually encapsulate the Ethernet frame, but rather inserts a 4-byte tag after the Source Address field and recomputes a new FCS, as shown in Figure 1. This tag not only preserves VLAN information, but also includes a 3-bit field for Class of Service (CoS) priority

 
Figure 1: IEEE 802.1Q tagging
IEEE 802.1Q also supports the concept of a native VLAN. Traffic sourced from the native VLAN is not tagged but is rather simply forwarded over the trunk. As such, only a single native VLAN can be configured for an 802.1Q trunk, to preserve logical separation.
Note 
Because traffic from the native VLAN is untagged, it is important to ensure that the same native VLAN be specified on both ends of the trunk. Otherwise, this can cause a routing black-hole and potential security vulnerability.
IEEE 802.1Q trunking is likewise configured on a switch port interface, as demonstrated in Example 2. The trunking mode is set to 802.1Q, and the VLANs permitted to traverse the trunk are explicitly identified. (In this example VLANs 3 and 103 are permitted over the 802.1Q trunk.) Additionally, VLAN 103 is specified as the native VLAN.
Example 2: IEEE 802.1Q Trunk Example

Switch(config)# interface GigabitEthernet8/4
Switch(config-if)# switchport
Switch(config-if)# switchport trunk encapsulation dot1q
Switch(config-if)# switchport trunk allowed 3, 103
Switch(config-if)# switchport trunk native vlan 103

Trunks are typically (but not always) configured in conjunction with EtherChannels, which allow for network link redundancy

IEEE 802.1w-Rapid Spanning Tree Protocol



Rapid Spanning Tree Protocol (RSTP) is an evolution of the 802.1D STP standard. RSTP is a Layer 2 loop prevention algorithm like 802.1D; however, RSTP achieves rapid failover and convergence times because RSTP is not a timer-based Spanning Tree Algorithm (STA) like 802.1D, but rather a handshake-based STA. Therefore, RSTP offers an improvement of 30 seconds or more (as compared to 802.1D) in transitioning a link into a Forwarding state.
The only three port states in RSTP are
  • Learning
  • Forwarding
  • Discarding
The Disabled, Blocking, and Listening states from 802.1D have been merged into a unique 802.1w Discarding state, which is a nonforwarding and nonparticipating RSTP port-state.
Rapid transition is the most important feature introduced by 802.1w. The legacy STA passively waited for the network to converge before moving a port into the Forwarding state. Achieving faster convergence was a matter of tuning the conservative default timers, often sacrificing the stability of the network.
RSTP can actively confirm that a port can safely transition to Forwarding without relying on any timer configuration. A feedback mechanism operates between RSTP-compliant bridges. To achieve fast convergence on a port, the RSTP relies on two new variables:
  • Edge ports: The edge port concept basically corresponds to the PortFast feature. The idea is that ports that directly connect to end stations cannot create bridging loops in the network and can, thus, directly transition to Forwarding (skipping the 802.1D Listening and Learning states). An edge port does not generate topology changes when its link toggles. Unlike PortFast though, an edge port that receives a BPDU immediately loses its edge port status and becomes a normal spanning-tree port.
  • Link type: RSTP can achieve only rapid transition to Forwarding on edge ports and on point-to-point links. The link type is automatically derived from the duplex mode of a port. A port operating in full-duplex will be assumed to be point-to-point, whereas a half-duplex port will be considered as a shared port by default. In today’s switched networks, most links operate in full-duplex mode and are, therefore, treated as point-to-point links by RSTP. This makes them candidates for rapid transition to forwarding.
Like STP, you can enable RSTP globally on a per-VLAN basis, also referred to as Rapid-Per-VLAN-Spanning Tree (Rapid-PVST) mode, using the following command:
Switch(config)# spanning-tree mode rapid-pvst

Cisco Spanning Tree Enhancements | L2 Network Availability Protocols



The STP 50-second convergence time results in TelePresence calls being self-terminated and is, therefore, unacceptable. Thus, if STP is to be used within a TelePresence campus network, STP convergence times need to be significantly improved.
To improve on STP convergence times, Cisco has made a number of enhancements to 802.1D STP, including the following:
  • PortFast (with BPDU-Guard)
  • UplinkFast
  • BackboneFast
STP PortFast causes a Layer 2 LAN port configured as an access port to enter the Forwarding state immediately, bypassing the Listening and Learning states. You can use PortFast on Layer 2 access ports connected to a single workstation or server to allow those devices to connect to the network immediately instead of waiting for STP to converge because interfaces connected to a single workstation or server should not receive BPDUs. Because the purpose of PortFast is to minimize the time that access ports must wait for STP to converge, it should be used only on access ports. Optionally, for an additional level of security, PortFast can be enabled with BPDU-Guard, which immediately shuts down a port that has received a BPDU.
You can enable PortFast globally (along with BPDU-Guard) or on a per-interface basis by entering the following commands:
Switch(config)# spanning-tree portfast default
Switch(config)# spanning-tree portfast bpduguard default
UplinkFast provides fast convergence after a direct link failure and achieves load balancing between redundant Layer 2 links. If a switch detects a link failure on the currently active link (a direct link failure), UplinkFast unblocks the blocked port on the redundant link port and immediately transitions it to the Forwarding state without going through the Listening and Learning states, as illustrated in Figure 1. This switchover takes approximately one to five seconds.

 
Figure 1: UplinkFast recovery example after direct link failure
You can enable UplinkFast globally, as follows:
Switch(config)# spanning-tree uplinkfast
In contrast, BackboneFast provides fast convergence after an indirect link failure, as shown in Figure 2. This switchover takes approximately 30 seconds (yet improves on the default STP convergence time by 20 seconds).

 
Figure 2: BackboneFast recovery example after indirect link failure
You can enable BackboneFast globally, as follows:
Switch(config)# spanning-tree backbonefast
These Cisco-proprietary enhancements to 802.1D STP were adapted and adopted into a new standard for STP, IEEE 802.1w or Rapid Spanning Tree Protocol (RSTP)

IEEE 802.1D Spanning Tree Protocol | L2 Network Availability Protocols



In TelePresence campus networks, redundant paths are encouraged within the network design; however, redundant paths might cause Layer 2 loops and, thus, recursive forwarding and packet drops.
The IEEE 802.1D Spanning Tree Protocol (STP) prevents loops from being formed when switches are interconnected through multiple paths. SPT implements the Spanning Tree Algorithm by exchanging Bridge Protocol Data Unit (BPDU) messages with other switches to detect loops and then removes the loop by blocking selected switch interfaces. This algorithm guarantees that there is one—and only one—active path between two network devices, as illustrated in Figure 1.

 
Figure 1: STP-based redundant topology
STP prevents a loop in the topology by transitioning all (STP-enabled) ports through four STP states:
  • Blocking: The port does not participate in frame forwarding. STP can take up to 20 seconds (by default) to transition a port from Blocking to Listening.
  • Listening: The port transitional state after the Blocking state when the spanning tree determines that the interface should participate in frame forwarding. STP takes 15 seconds (by default) to transition between Listening and Learning.
  • Learning: The port prepares to participate in frame forwarding. STP takes 15 seconds (by default) to transition from Learning to Forwarding (provided such a transition does not cause a loop; otherwise, the port will be set to Blocking).
  • Forwarding: The port forwards frames.
Figure 2 illustrates the STP states, including the disabled state.

 
Figure 2: STP port states
You can enable STP globally on a per-VLAN basis (referred to as Per-VLAN Spanning Tree [PVST]) by entering the following command:
Switch(config)# spanning-tree vlan 100
The two main availability limitations for STP follow:
  • To prevent loops, redundant ports are placed in a Blocking state and as such are not used to forward frames and packets. This significantly reduces the advantages of redundant network design, especially for network capacity and load-sharing.
  • Adding up all the times required for STP port-state transitions shows that STP can take up to 50 seconds to converge on a loop-free topology. Although this might have been acceptable when the protocol was first designed, it is certainly unacceptable today.
Both limitations are addressable using additional technologies. The first limitation can be addressed by using the Cisco Virtual Switching System, and the second limitation can be addressed by various enhancements that Cisco developed for STP.

UniDirectional Link Detection | L2 Network Availability Protocols



In TelePresence campus networks, a link can transmit in one direction only, causing a lengthy delay in fault detection and, thus, excessive packet loss.
UniDirectional Link Detection (UDLD) protocol is a Layer 2 protocol that uses a keepalive to test that the switch-to-switch links connect and operate correctly. Enabling UDLD is a prime example of how to implement a defense-in-depth approach to failure detection and recovery mechanisms because UDLD (a L2 protocol) acts as a backup to the native Layer 1 unidirectional link detection capabilities provided by IEEE 802.3z (Gigabit Ethernet) and 802.3ae (Ten Gigabit Ethernet) standards.
The UDLD protocol allows devices connected through fiber-optic or copper Ethernet cables connected to LAN ports to monitor the physical configuration of the cables and detect when a unidirectional link exists. When a unidirectional link is detected, UDLD shuts down the affected LAN port and triggers an alert. Unidirectional links, such as shown in Figure 1, can cause a variety of problems, including spanning tree topology loops.

 
Figure 1: Unidirectional link failure
You can configure UDLD to be globally enabled on all fiber ports by entering the following command:
Switch(config)# udld enable
Additionally, you can enable UDLD on individual LAN ports in interface mode by entering the following commands:
Switch(config)# interface GigabitEthernet8/1
Switch(config-if)# udld port
Interface configurations override global settings for UDLD.

Network Availability Protocols



Network availability protocols, which include link integrity protocols, link bundling protocols, loop detection protocols, first-hop redundancy protocols (FHRP), and routing protocols, increase the resiliency of devices connected within a network. Network resiliency relates to how the overall design implements redundant links and topologies and how the control-plane protocols are optimally configured to operate within that design. The use of physical redundancy is a critical part of ensuring the availability of the overall network. If a network device fails, having a path means the overall network can continue to operate. The control-plane capabilities of the network provide the capability to manage the way in which the physical redundancy is leveraged, the network load balances traffic, the network converges, and the network is operated.
You can apply the following basic principles to network availability technologies:
  • Wherever possible, leverage the capability of the device hardware to provide the primary detection and recovery mechanism for network failures. This ensures both a faster and a more deterministic failure recovery.
  • Implement a defense-in-depth approach to failure detection and recovery mechanisms. Multiple protocols, operating on different network layers, can complement each other in detecting and reacting to network failures.
  • Ensure that the design is self-stabilizing. Use a combination of control-plane modularization to ensure that any failures are isolated in their impact and that the control plane prevents any flooding or thrashing conditions from arising.
These principles are intended to be a complementary part of the overall structured modular design approach to the network architecture and primarily serve to reenforce good resilient network design practices.
Note 
A complete discussion of all network availability technologies and best practices could easily fill an entire volume. Therefore, this discussion introduces and provides only an overview of the network availability technologies most relevant to TelePresence enterprise network deployments.
The protocols discussed in this section can be subdivided between Layer 2 (L2) and Layer 3 (L3) network availability protocols. 

Nonstop Forwarding with Stateful Switchover | Device Availability Technologies



SSO is a redundant route- and switch-processor availability feature that significantly reduces MTTR by allowing extremely fast switching between the main and backup processors. SSO is supported on routers (such as the Cisco 7600, 10000, and 12000 series families) and switches (such as the Catalyst 4500 and 6500 series families).
Prior to discussing the details of SSO, a few definitions might be helpful. For example, “state” in SSO refers to maintaining—among many other elements—the following between the active and standby processors:
  • Layer 2 protocols configurations and current status
  • Layer 3 protocol configurations and current status
  • Multicast protocol configurations and current status
  • QoS policy configurations and current status
  • Access list policy configurations and current status
  • Interface configurations and current status
Also, the adjectives cold, warm, or hot denote the readiness of the system and its components to assume the network services functionality and the job of forwarding packets to their destination. These terms appear in conjunction with Cisco IOS verification command output relating to NSF/SSO and with many high availability feature descriptions:
  • Cold: Cold redundancy refers to the minimum degree of resiliency that has been traditionally provided by a redundant system. A redundant system is cold when no state information is maintained between the backup or standby system and the system it offers protection to. Typically a cold system would have to complete a boot process before it came online and would be ready to take over from a failed system.
  • Warm: Warm redundancy refers to a degree of resiliency beyond the cold standby system. In this case, the redundant system has been partially prepared but does not have all the state information known by the primary system, so it can take over immediately. Some additional information must be determined or gleaned from the traffic flow or the peer network devices to handle packet forwarding. A warm system would already be booted up and would need to learn or generate only state information prior to taking over from a failed system.
  • Hot: Hot redundancy refers to a degree of resiliency where the redundant system is fully capable of handling the traffic of the primary system. Substantial state information has been saved, so the network service is continuous, and the traffic flow is minimally or not affected.
To better understand SSO, it might be helpful to consider its operation in detail within a specific context, such as within a Cisco Catalyst 6500 with two supervisors per chassis.
The supervisor engine that boots first becomes the active supervisor engine. The active supervisor is responsible for control-plane and forwarding decisions. The second supervisor is the standby supervisor, which does not participate in the control- or data-plane decisions. The active supervisor synchronizes configuration and protocol state information to the standby supervisor, which is in a hot-standby mode. As a result, the standby supervisor is ready to take over the active supervisor responsibilities if the active supervisor fails. This “take-over” process from the active supervisor to the standby supervisor is referred to as switchover.
Only one supervisor is active at a time, and supervisor-engine redundancy does not provide supervisor-engine load balancing. However, the interfaces on a standby supervisor engine are active when the supervisor is up and, thus, can be used to forward traffic in a redundant configuration.
NSF/SSO evolved from a series of progressive enhancements to reduce the impact of MTTR relating to specific supervisor hardware/software network outages. NSF/SSO builds on the earlier work known as Route Processor Redundancy (RPR) and RPR Plus (RPR+)Each of these redundancy modes of operation incrementally improves upon the functions of the previous mode:
  • RPR: The first redundancy mode of operation introduced in Cisco IOS Software. In RPR mode, the startup configuration and boot registers are synchronized between the active and standby supervisors; the standby is not fully initialized; and images between the active and standby supervisors do not need to be the same. Upon switchover, the standby supervisor becomes active automatically, but it must complete the boot process. In addition, all line cards are reloaded, and the hardware is reprogrammed. Because the standby supervisor is “cold,” the RPR switchover time is 2 or more minutes.
  • RPR+: An enhancement to RPR in which the standby supervisor is completely booted, and line cards do not reload upon switchover. The running configuration is synchronized between the active and the standby supervisors, which run the same software versions. All synchronization activities inherited from RPR are also performed. The synchronization is done before the switchover, and the information synchronized to the standby is used when the standby becomes active to minimize the downtime. No link layer or control-plane information is synchronized between the active and the standby supervisors. Interfaces might bounce after switchover, and the hardware contents need to be reprogrammed. Because the standby supervisor is “warm,” the RPR+ switchover time is 30 or more seconds.
  • NSF with SSO: NSF works in conjunction with SSO to ensure Layer 3 integrity following a switchover. It allows a router experiencing the failure of an active supervisor to continue forwarding data packets along known routes while the routing protocol information is recovered and validated. This forwarding can continue to occur even though peering arrangements with neighbor routers have been lost on the restarting router. NSF relies on the separation of the control plane and the data plane during supervisor switchover. The data plane continues to forward packets based on pre-switchover Cisco Express Forwarding (CEF) information. The control-plane implements graceful restart routing protocol extensions to signal a supervisor restart to NSF-aware neighbor routers, reform its neighbor adjacencies, and rebuild its routing protocol database (in the background) following a switchover. Because the standby supervisor is “hot,” the NSF/SSO switchover time is 0 to 3 seconds.
As previously described, neighbor nodes play a role in NSF function. A node that is capable of continuous packet forwarding during a route processor switchover is NSF-capable. Complementing this functionality, an NSF-aware peer router can enable neighbor recovery without resetting adjacencies and support routing database resynchronization to occur in the background. Figure 1 illustrates the difference between NSF-capable and NSF-aware routers. To gain the greatest benefit from NSF/SSO deployment, NSF-capable routers should be peered with NSF-aware routers (although this is not absolutely required for implementation) because only limited benefit will be achieved unless routing peers are aware of the capability of the restarting node to continue packet forwarding and assist in restoring and verifying the integrity of the routing tables after a switchover.

 
Figure 1: NSF-capable versus NSF-aware routers
Cisco NSF and SSO are designed to be deployed together. NSF relies on SSO to ensure that links and interfaces remain up during switchover and that lower layer protocol state is maintained. However, it is possible to enable SSO with or without NSF because these are configured separately.
The configuration to enable SSO is simple, as shown here:
Router(config)# redundancy
Router(config-red)# mode sso
NSF, on the other hand, is configured within the routing protocol and is supported within EIGRP, OSPF, IS-IS and (to an extent) BGP. Sometimes NSF functionality is also called “graceful-restart.”
To enable NSF for EIGRP, enter the following commands:
Router(config)# router eigrp 100
Router(config-router)# nsf
Similarly, to enable NSF for OSPF, enter the following commands:
Router(config)# router ospf 100
Router(config-router)# nsf
Continuing the example, to enable NSF for IS-IS, enter the following commands:
Router(config)# router isis level2
Router(config-router)# nsf cisco
And finally, to enable NSF/graceful-restart for BGP, enter the following commands:
Router(config)# router bgp 100
Router(config-router)# bgp graceful-restart
You can see from the example of NSF that the line between device-level availability technologies and network availability technologies sometimes is blurry. A discussion of more network availability technologies follows.