BGP Add-Path

This chapter provides information about BGP Add-Path.

Topics in this chapter include:

Applicability

The chapter was initially written for SR OS Release 14.0.R7, but the CLI in the current edition is based on SR OS Release 22.2.R2.

Overview

When a BGP router learns multiple paths for the same prefix, it selects one route as its best path and advertises only this route to its BGP peers. The BGP add-path feature allows advertising the best n paths for the same prefix, where n is configurable. If the set of n paths includes multiple paths with the same BGP next hop, only the best route with a specific next hop is advertised and the other paths are suppressed.

The BGP add-path feature increases path visibility in the Autonomous System (AS), because more routes are stored in the Routing Information Base (RIB). BGP add-path has the following benefits:

  • Faster convergence after failure

  • Enhanced load-sharing

  • Reduced routing churn

These benefits are described in the following sections.

Faster convergence after failure

RR advertises best path only – path A preferred over path B shows a network that does not support add-path. CE-4 advertises two paths for prefix 10.0.4.0/24 to its EBGP neighbors: PE-1 and PE-2. PE-1 has an import policy that sets the local preference (LP) of path A to 200; PE-2 keeps the default LP of 100 for path B. Therefore, path A that is advertised to PE-1 is preferred in AS 64496. The route reflector RR-5 advertises the preferred path A to PE-2 and PE-3. PE-2 suppresses the advertisement of its external path (B) to RR-5, because path A is preferred. Traffic from CE-6 to CE-4 is sent via PE-3 and PE-1.

Figure 1. RR advertises best path only – path A preferred over path B

When the link between CE-4 and PE-1 fails, the following steps take place for reconvergence:

  1. PE-1 sends a BGP update withdrawing path A to RR-5.

  2. RR-5 receives and propagates the withdrawal to its other clients: PE-2 and PE-3.

  3. PE-2 receives the withdrawal of path A and reruns the BGP decision process. PE-2 selects path B as its best route and advertises path B to RR-5.

  4. RR-5 receives the BGP update for path B and reruns its BGP decision process. RR-5 selects path B as its best path and advertises path B to its other clients: PE-1 and PE-3.

  5. PE-1 and PE-3 rerun their BGP decision process and determine that path B is the best path. Traffic can flow from CE-6 to CE-4 via PE-3 and PE-2.

Reconvergence after path failure (without add-path) shows the BGP updates sent to withdraw path A and advertise path B.

Figure 2. Reconvergence after path failure (without add-path)

If the propagation time of a BGP update message between RR-5 and any of its clients is X, the convergence time is four times X, plus processing, transmission, and queuing delays.

With the use of add-path on all BGP routers in AS 64496, the convergence time can be reduced considerably, because PE-3 has more than one path for prefix 10.0.4.0/24 in its RIB-IN before the failure takes place. When there are no failures, PE-2 decides that path A is best, and PE-2 also advertises its second-best path (B)—which is its best external path—to RR-5. With add-path enabled, the RR has knowledge of two paths for prefix 10.0.4.0/24 and advertises both to its clients. PE-3 receives two routes for prefix 10.0.4.0/24, reruns the BGP decision process, and updates its forwarding table based on the results. The following options are possible:

  • Path A is the best path, whereas path B is maintained in the RIB-IN. The FIB entry for destination 10.0.4.0/24 points at path {A} only.

  • When BGP FRR is enabled as described in chapter BGP Fast Reroute, path A is the best path and path B is the second-best path. The FIB entry for destination 10.0.4.0/24 points to path {A,B}. If path A is available, it is used for all traffic to the destination; if path A is unavailable but path B is available, then all traffic to the destination is directed to path B. In this case, path B is effectively a pre-computed, pre-installed backup path for the destination.

  • When Equal Cost Multi-Path (ECMP) and BGP multipath are enabled and the paths have an equal cost, both paths A and B represent the best path. The FIB entry for destination 10.0.4.0/24 points to multipath entry {A,B}. When both paths are available, traffic to the destination is load-shared across paths A and B. If only one path is available, traffic is directed to that available path.

Advertised paths when BGP add-path is enabled in PEs and RR shows the BGP update messages prior to any failures. RR-5 receives path A from PE-1 and path B from PE-2, whereas it advertises path B to PE-1, path A to PE-2, and both path A and path B to PE-3. Path B has the default LP 100, whereas path A gets LP 200 as per import policy on PE-1. However, in case of ECMP, both paths keep the default LP 100.

Figure 3. Advertised paths when BGP add-path is enabled in PEs and RR

Reconvergence after path failure when BGP add-path is enabled shows the BGP update messages that are sent after a link failure between CE-4 and PE-1. With add-path, fewer steps are required for convergence:

  1. PE-1 sends a BGP update message withdrawing path A.

  2. RR-5 receives the withdrawal and propagates it to its clients PE-2 and PE-3.

  3. PE-2 and PE-3 receive the withdrawal, rerun the BGP decision process, and update the forwarding entry for destination 10.0.4.0/24: path B is best.

Figure 4. Reconvergence after path failure when BGP add-path is enabled

The convergence time with add-path is much shorter than without add-path. If X is the propagation time of a BGP update message between RR and any of the PEs, then the convergence time is the time required for the BGP update from PE-1 to RR-5 (X) plus the time required for the BGP update propagation from RR-5 to the other PEs (X), in addition to delays for processing, transmission, and queuing. The convergence with add-path is twice as fast as without add-path.

For some types of failures, the convergence can be even faster:

  • When PE-1 becomes unreachable, the next-hop tracking by PE-3 will invalidate path A before the BGP withdrawal message is received from RR-5.

  • If PE-3 implements BGP FRR and path A has been marked as unusable, PE-3 can switch traffic destined to 10.0.4.0/24 to path B.

  • When Bidirectional Forwarding Detection (BFD) is enabled on the EBGP sessions and on the IGP protocol, the failure is detected faster and BGP convergence can be sped up when BGP FRR is enabled.

Enhanced load-sharing

When paths A and B are equal in cost or preference, and ECMP and BGP multipath are enabled on all PEs, load-sharing can be done for traffic with destination 10.0.4.0/24. With BGP add-path, both paths A and B are advertised to the PEs. PE-3 runs the BGP decision process and determines that paths A and B are both best paths to destination 10.0.4.0/24, so paths A and B are combined into one multipath forwarding entry: {A,B}.

The benefits of load-sharing for traffic to destination 10.0.4.0/24 are the following:

  • More even bandwidth utilization of the links in AS 64496

  • More even bandwidth utilization for traffic across peering points PE-1 and PE-2 with AS 64500

  • Faster reaction to some failures; for example, the BGP next hop for one of the paths becomes unreachable in the IGP and next hop tracking is enabled.

Reduced routing churn

Routing churn refers to repeated advertisements and withdrawals of a prefix and path. Some degree of routing churn is normal and expected in most networks. However, it should be contained as much as possible to avoid overloading router CPUs. Routing churn can be caused by:

  • Flapping links (links that repeatedly transition between up and down state)

  • Route oscillation (networks that use RRs or AS confederations and BGP path selection relies on Multi Exit Discriminator (MED) and IGP cost comparisons)

Add-path helps to reduce routing churn by constraining the effect of some failures to the local AS where they occur. For example, the link between CE-4 and PE-1 could repeatedly cycle up and down due to a misconfiguration. When the link goes down, a BGP withdrawal message is sent by PE-1 to RR-5 and from RR-5 to the other RR clients (PE-2 and PE-3). PE-3 will withdraw and advertise path A to its EBGP peer CE-6 in AS 64501, but path B is constantly advertised to CE-6 (when add-path has been negotiated between PE-3 and CE-6).

Without add-path, PE-2 would be affected by the instability in AS 64496 and there would be periods of time when AS 64501 has no paths to destination 10.0.4.0/24 (between the withdrawal of path A and the advertisement of path B).

Add-path implementation

BGP add-path is configured in the base routing instance, for IBGP or EBGP, per address family at different levels: in the global bgp context, per group, and per neighbor. The following address families are supported:

*A:PE-1>config>router>bgp# add-paths ?
  - add-paths
  - no add-paths

 [no] evpn            - Configure evpn ADD-PATH limits
 [no] ipv4            - Configure ipv4 ADD-PATH limits
 [no] ipv6            - Configure ipv6 ADD-PATH limits
 [no] label-ipv4      - Configure label-ipv4 ADD-PATH limits
 [no] label-ipv6      - Configure label-ipv6 ADD-PATH limits
 [no] mcast-vpn-ipv4  - Configure mcast-vpn-ipv4 ADD-PATH limits
 [no] mcast-vpn-ipv6  - Configure mcast-vpn-ipv6 ADD-PATH limits
 [no] mvpn-ipv4       - Configure mvpn-ipv4 ADD-PATH limits
 [no] mvpn-ipv6       - Configure mvpn-ipv6 ADD-PATH limits
 [no] vpn-ipv4        - Configure vpn-ipv4 ADD-PATH limits
 [no] vpn-ipv6        - Configure vpn-ipv6 ADD-PATH limits

Up to 16 paths are configurable per address family per peer (send-limit):

*A:PE-1>config>router>bgp>add-paths# ipv4 ?
  - ipv4 send <send-limit>
  - ipv4 send <send-limit> receive [none]
  - no ipv4

 <send-limit>         : [1..16]|none|multipaths

Only the number of advertised routes per prefix is controlled, not the number of received routes. All routes advertised by an add-path peer are accepted; otherwise, routing loops might occur. If a BGP speaker is configured with <send-limit> n, but has more than n paths available in the LOC-RIB, it selects the n best paths with unique BGP next hops following the Add-n path selection algorithm described in draft-ietf-idr-add-paths-guidelines. Also, the send limit n can be overridden, for specific prefixes, using route policies.

When BGP add-path is configured for an address family, the BGP capability will be announced to the BGP peer as part of the BGP open message, as follows:

# Enable debugging for BGP open messages on PE-1:
debug 
    router "Base"
        bgp
            open
        exit
58 2022/05/04 08:04:37.417 UTC MINOR: DEBUG #2001 Base BGP
"BGP: OPEN
Peer 1: 192.0.2.5 - Send (Passive) BGP OPEN: Version 4
   AS Num 64496: Holdtime 90: BGP_ID 192.0.2.1: Opt Length 26 (ExtOpt F)
   Opt Para: Type CAPABILITY: Length = 24: Data:
     Cap_Code GRACEFUL-RESTART: Length 2
       Bytes: 0x0 0x78
     Cap_Code MP-BGP: Length 4
       Bytes: 0x0 0x1 0x0 0x1
     Cap_Code ROUTE-REFRESH: Length 0
     Cap_Code 4-OCTET-ASN: Length 4
       Bytes: 0x0 0x0 0xfb 0xf0
Cap_Code ADD-PATH: Length 4
Bytes: 0x0 0x1 0x1 0x3
"

The BGP add-path capability code value typically consists of one or more blocks of four bytes; two octets for the Address Family Identifier (AFI), one octet for the Subsequent Address Family Identifier (SAFI), and one octet for send/receive. In this example, AFI/SAFI bytes point to an IPv4 address family and send/receive value "3" means that the sender is able to receive and send multiple paths from/to its BGP peer.

In BGP update messages, a 4-octet path identifier (ID) is added to the Network Layer Reachability Information (NLRI) field. The combination of both prefix and path ID identifies a BGP path. SR OS allocates path IDs sequentially on a per address family basis, not per prefix. The path ID is only locally significant, which means that when a BGP speaker re-advertises a route with path IDs, it must generate its own path ID.

# Enable debugging for BGP UPDATE messages on RR-5:
debug
    router "Base"
        bgp
            update
        exit

RR-5 received the following BGP update for prefix 10.0.4.0/24 with path ID.

50 2022/05/04 08:05:07.380 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.2
"Peer 1: 192.0.2.2: UPDATE
Peer 1: 192.0.2.2 - Received BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 27
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 2 Len: 6 AS Path:
        Type: 2 Len: 1 < 64500 >
    Flag: 0x40 Type: 3 Len: 4 Nexthop: 192.0.2.2
    Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
NLRI: Length = 8
10.0.4.0/24 Path-ID 8
"

When routers have negotiated to advertise (and receive) routes with path identifiers, all BGP updates (advertisements or withdrawals) without path identifier will be rejected. There will be an NLRI parsing error—because the BGP update has an incorrect length—and a notification will be sent.

Configuration

The following configuration examples are in this section:

  • BGP without add-path

  • BGP with add-path for address family IPv4: no BGP FRR, no ECMP

  • BGP with add-path for address family IPv4 and BGP FRR enabled

  • BGP with add-path for address family IPv4 and ECMP enabled

  • BGP with add-path for address family VPN-IPv4 and BGP FRR enabled

  • BGP with add-path for address family VPN-IPv4 and ECMP enabled

Example topology shows the example topology with CE-4 in AS 64500 advertising route 10.0.4.0/24 to its EBGP peers PE-1 and PE-2 in AS 64496. PE-1 has an import policy that sets the LP for this route to 200, whereas PE-2 keeps the default local preference of 100. RR-5 is RR for all PEs in AS 64496. CE-6 in AS 64501 peers with PE-3 in AS 64496 and can send traffic to CE-4 in AS 64500.

Figure 5. Example topology

Initial configuration

The initial configuration on all nodes includes:

  • Cards, MDAs, ports

  • Router interfaces

  • IS-IS as IGP on all interfaces within AS 64496 (alternatively, OSPF can be used)

  • LDP on all interfaces between the PEs in AS 64496, but not toward RR-5

BGP is configured on all the nodes. CE-4 peers with PE-1 and PE-2 and exports prefix 10.0.4.0/24 to both EBGP peers, as follows:

# on CE-4:
configure
    router Base
        autonomous-system 64500
        policy-options
            begin
            prefix-list "10.0.4.0/24"
                prefix 10.0.4.0/24 exact
            exit
            policy-statement "export-bgp"
                entry 10
                    from
                        prefix-list "10.0.4.0/24"
                    exit
                    action accept
                    exit
                exit
            exit
            commit
        exit
        bgp
            rapid-withdrawal
            split-horizon
            group "EBGP"
                export "export-bgp"
                peer-as 64496
                neighbor 172.16.14.1
                exit
                neighbor 172.16.24.1
                exit
            exit

The BGP configuration on CE-6 is similar.

PE-1 peers with CE-4 in AS 64500 and RR-5 in AS 64496. An import policy is configured to set the LP to 200 for all routes received from CE-4, as follows:

# on PE-1:
configure
    router Base
        autonomous-system 64496
        policy-options
            begin
            policy-statement "import-bgp-LP200"
                default-action accept
                    local-preference 200
                exit
            exit
            commit
        exit 
        bgp
            rapid-withdrawal
            split-horizon
            group "EBGP"
                import "import-bgp-LP200"
                peer-as 64500
                neighbor 172.16.14.2
                exit
            exit
            group "IBGP"
                next-hop-self
                peer-as 64496
                neighbor 192.0.2.5
                exit
            exit

The BGP configuration on PE-2 and PE-3 is similar, but there is no import policy.

The BGP configuration on RR-5 is as follows:

# on RR-5:
configure
    router Base
        autonomous-system 64496
        bgp
            rapid-withdrawal
            split-horizon
            group "IBGP"
                cluster 192.0.2.5
                peer-as 64496
                neighbor 192.0.2.1
                exit
                neighbor 192.0.2.2
                exit
                neighbor 192.0.2.3
                exit
            exit

PE-1 advertises a route for prefix 10.0.4.0/24 with LP 200 to RR-5. RR-5 propagates this route to its other clients: PE-2 and PE-3. When PE-2 learns this route, it does not advertise its own route for 10.0.4.0/24 with LP 100 to RR-5 anymore. PE-3 only learns the route for prefix 10.0.4.0/24 with LP 200, as follows:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        200         None
      192.0.2.1                                          None        10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

Reconvergence without add-path

A failure of the link between CE-4 and PE-1 is simulated as follows:

# on CE-4:
configure 
    router Base
        interface "int-CE-4-PE-1" 
            shutdown

The following four BGP update messages are received or sent by RR-5.

RR-5 receives the following withdrawal message from PE-1:

# on RR-5:
28 2022/05/04 08:00:38.222 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.1
"Peer 1: 192.0.2.1: UPDATE
Peer 1: 192.0.2.1 - Received BGP UPDATE:
    Withdrawn Length = 4
        10.0.4.0/24
    Total Path Attr Length = 0
"

RR-5 propagates this withdrawal to its other clients, for example to PE-2, as follows:

# on RR-5:
29 2022/05/04 08:00:38.223 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.2
"Peer 1: 192.0.2.2: UPDATE
Peer 1: 192.0.2.2 - Send BGP UPDATE:
    Withdrawn Length = 4
        10.0.4.0/24
    Total Path Attr Length = 0
"

When PE-2 receives this withdrawal, it reruns the BGP decision process and decides that its route for prefix 10.0.4.0/24 with LP 100 is the best route. PE-2 advertises this route to RR-5; it is received by RR-5 as follows:

# on RR-5:
31 2022/05/04 08:00:57.380 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.2
"Peer 1: 192.0.2.2: UPDATE
Peer 1: 192.0.2.2 - Received BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 27
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 2 Len: 6 AS Path:
        Type: 2 Len: 1 < 64500 >
    Flag: 0x40 Type: 3 Len: 4 Nexthop: 192.0.2.2
    Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
NLRI: Length = 4
10.0.4.0/24
"

RR-5 propagates this message to its other clients: PE-1 and PE-3. The following BGP update is sent to PE-3:

# on RR-5:
32 2022/05/04 08:01:00.618 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.3
"Peer 1: 192.0.2.3: UPDATE
Peer 1: 192.0.2.3 - Send BGP UPDATE:
    Withdrawn Length = 0
    Total Path Attr Length = 41
    Flag: 0x40 Type: 1 Len: 1 Origin: 0
    Flag: 0x40 Type: 2 Len: 6 AS Path:
        Type: 2 Len: 1 < 64500 >
    Flag: 0x40 Type: 3 Len: 4 Nexthop: 192.0.2.2
    Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
    Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.2
    Flag: 0x80 Type: 10 Len: 4 Cluster ID:
        192.0.2.5
    NLRI: Length = 4
        10.0.4.0/24
"

Again, PE-3 has only one route for prefix 10.0.4.0/24, but this time with next hop 192.0.2.2, as follows:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        100         None
      192.0.2.2                                          None        10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

The configuration is restored as follows:

# on CE-4:
configure 
    router Base
        interface "int-CE-4-PE-1" 
            no shutdown

Add-path enabled: no BGP FRR, no ECMP

Before add-path is enabled, the following information is displayed on PE-1 for BGP neighbor RR-5:

*A:PE-1# show router bgp neighbor 192.0.2.5 | match "Local AddPath" post-lines 2
Local AddPath Capabi*: Disabled
Remote AddPath Capab*: Send - None
                     : Receive - None

Add-path is enabled on PE-1 and PE-2 with a send path limit of two for groups "EBGP" and "IBGP" and no limit on the receive path limit, which is the default setting, as follows:

# on PE-1 and PE-2:
configure
    router Base
        bgp
            group "EBGP"
                add-paths
                    ipv4 send 2 receive
                exit
            exit 
            group "IBGP" 
                add-paths
                    ipv4 send 2 receive
                exit
            exit

When the preceding show command is repeated on PE-1 or PE-2, the local BGP add-path capabilities are specified for address family IPv4: a maximum of two paths can be sent for a specific IPv4 prefix. The remote peer RR-5 does not have add-path enabled yet.

*A:PE-1# show router bgp neighbor 192.0.2.5 | match "Local AddPath" post-lines 3
Local AddPath Capabi*: Send - ipv4 (2)
                     : Receive - ipv4
Remote AddPath Capab*: Send - None
                     : Receive - None

Initially, add-path remains disabled on PE-3. On the RR, add-path is enabled for neighbors 192.0.2.1 and 192.0.2.2, but not for 192.0.2.3 yet. For neighbor 192.0.2.1, the receive none option implies that the add-path receive capability is not negotiated.

# on RR-5:
configure
    router Base
        bgp
            group "IBGP"
                neighbor 192.0.2.1
                    add-paths
                        ipv4 send 2 receive none
                    exit
                exit
            exit
            group "IBGP"
                neighbor 192.0.2.2
                    add-paths
                        ipv4 send 2 receive
                    exit
                exit

The following output shows that add-path is enabled locally on RR-5 and remotely on PE-1 for address family IPv4. RR-5 can send a maximum of two paths for a specific prefix toward PE-1 and PE-2; toward PE-3, add-path remains disabled.

*A:RR-5# show router bgp neighbor 192.0.2.1 | match "Local AddPath" post-lines 3 
Local AddPath Capabi*: Send - ipv4 (2)
                     : Receive - None
Remote AddPath Capab*: Send - ipv4
                     : Receive - ipv4
*A:RR-5# show router bgp neighbor 192.0.2.2 | match "Local AddPath" post-lines 3 
Local AddPath Capabi*: Send - ipv4 (2)
                     : Receive - ipv4
Remote AddPath Capab*: Send - ipv4
                     : Receive - ipv4
*A:RR-5# show router bgp neighbor 192.0.2.3 | match "Local AddPath" post-lines 2 
Local AddPath Capabi*: Disabled
Remote AddPath Capab*: Send - None
                     : Receive - None

The receive none option indicates that RR-5 does not negotiate the add-path receive capability with its peer. PE-1 knows that peer 192.0.2.5 may send IPv4 routes with a path ID, but has no information about what this peer will receive:

*A:PE-1# show router bgp neighbor 192.0.2.5 | match "Local AddPath" post-lines 3 
Local AddPath Capabi*: Send - ipv4 (2)
                     : Receive - ipv4
Remote AddPath Capab*: Send - ipv4
                     : Receive - None

With BGP add-path enabled, PE-2 will advertise its second-best route for prefix 10.0.4.0/24 with LP 100 to RR-5. PE-1, PE-2, and RR-5 will have two routes for prefix 10.0.4.0/24 in their RIB-IN, but only the route with LP 200 will be used. The following output shows the BGP routes on RR-5, but it resembles the output on PE-1 and PE-2:

*A:RR-5# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.5        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        200         None
      192.0.2.1                                          None        10
      64500                                                          -
*i    10.0.4.0/24                                        100         None
      192.0.2.2                                          1           10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

Even though RR-5 has two routes for this prefix, it only advertises its best route to PE-3, because add-path is not enabled for this BGP session. Therefore, PE-3 only has the route for 10.0.4.0/24 with LP 200, as follows:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        200         None
      192.0.2.1                                          None        10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

When add-path is enabled on the session between PE-3 and RR-5, the second route will also be advertised, as follows:

# on PE-3:
configure
    router Base
        bgp
            group "IBGP"
                add-paths
                    ipv4 send 2 receive
                exit
# on RR-5:
configure
    router Base
        bgp
            group "IBGP"
                neighbor 192.0.2.3
                    add-paths
                        ipv4 send 2 receive
                    exit
*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        200         None
      192.0.2.1                                          14          10
      64500                                                          -
*i    10.0.4.0/24                                        100         None
      192.0.2.2                                          15          10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

BGP add-path is enabled, but BGP FRR or ECMP are disabled. The routing table on PE-3 only contains one entry for prefix 10.0.4.0/24:

*A:PE-3# show router route-table 10.0.4.0/24

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.0.4.0/24                                   Remote  BGP       00h00m29s  170
       192.168.13.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

Reconverge with add-path: no BGP FRR, no ECMP

A link failure between CE-4 and PE-1 is simulated as follows:

# on CE-4:
configure
    router Base
        interface "int-CE-4-PE-1" 
            shutdown

PE-1 sends a withdrawal message for route 10.0.4.0/24 with LP 200 to RR-5 and reruns the BGP decision process. RR-5 propagates this withdrawal message to its other clients that rerun the BGP decision process. As a result, the route for prefix 10.0.4.0/24 with LP 100 will be used on all nodes; for example, on PE-3:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        100         None
      192.0.2.2                                          15          10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

The routing table contains a route to 10.0.4.0/24 with PE-2 as next hop, as follows:

*A:PE-3# show router route-table 10.0.4.0/24

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.0.4.0/24                                   Remote  BGP       00h00m10s  170
       192.168.23.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

The convergence with add-path enabled is twice as fast as without BGP add-path. With BGP add-path disabled, four sequential messages are sent:

  1. PE-1 sends a withdrawal to RR-5.

  2. RR-5 propagates withdrawal.

  3. PE-2 advertises its route.

  4. RR-5 propagates the route.

In the scenario with add-path, the last two messages are already sent before the failure happened. During convergence, only two withdrawal messages are sent: PE-1 sends a withdrawal to RR-5; RR-5 propagates this to its clients.

Add-path and BGP FRR

The convergence time can be further reduced by enabling BGP FRR, where the BGP decision process runs for the best route and the backup path before any failure happens, as described in chapter BGP Fast Reroute. On all PEs, BGP FRR is enabled for the IPv4 address family, as follows:

# on all PEs:
configure
    router Base
        bgp 
            backup-path ipv4

Each PE has two routes for prefix 10.0.4.0/24 and when BGP FRR is enabled, both are used, but one is used as backup, indicated by the "b"-flag in the following output:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        200         None
      192.0.2.1                                          20          10
      64500                                                          -
ub*i  10.0.4.0/24                                        100         None
192.0.2.2                                          15          10
64500                                                          -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

The following routing table on PE-3 shows the active route for 10.0.4.0/24 and adds an indication "B", indicating that a BGP backup route is available:

*A:PE-3# show router route-table 10.0.4.0/24

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.0.4.0/24 [B]                               Remote  BGP       00h00m49s  170
       192.168.13.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

The following output shows both the active and the backup route for prefix 10.0.4.0/24:

*A:PE-3# show router route-table 10.0.4.0/24 alternative

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                   Metric
      Alt-NextHop                                                Alt-
                                                                 Metric
-------------------------------------------------------------------------------
10.0.4.0/24                                   Remote  BGP       00h00m49s  170
       192.168.13.1                                                 10
10.0.4.0/24 (Backup)                          Remote  BGP       00h00m49s  170
192.168.23.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 2
Flags: n = Number of times nexthop is repeated
Backup = BGP backup route
       LFA = Loop-Free Alternate nexthop
       S = Sticky ECMP requested
===============================================================================

In case of link failure between CE-4 and PE-1, the same BGP withdrawals will be sent from PE-1 to RR-5 and from RR-5 to PE-2 and PE-3. When PE-2 and PE-3 receive the withdrawal, the BGP decision process need not run again. The backup path is promoted to active immediately.

BGP FRR is disabled on the PEs as follows:

# on all PEs:
configure 
    router Base
        bgp 
            no backup-path

Add-path and ECMP

On PE-1, the import policy is removed to have paths with equal cost:

# on PE-1:
configure 
    router Base
        bgp 
            group "EBGP" 
                no import

ECMP is enabled on all PEs with a value of two, as follows:

# on all PEs:
configure 
    router Base
        ecmp 2

On all PEs, BGP multipath is configured with the maximum number of paths equal to two in the bgp context, as follows:

# on all PEs:
configure 
    router Base
        bgp 
            multi-path
                maximum-paths 2

For more information about BGP multipath, see chapter BGP Multipath.

All PEs have two routes for prefix 10.0.4.0/24 and both are active when ECMP is enabled; for example, for PE-3, as follows:

*A:PE-3# show router bgp routes 10.0.4.0/24
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.0.4.0/24                                        100         None
      192.0.2.1                                          20          10
      64500                                                          -
u*>i  10.0.4.0/24                                        100         None
      192.0.2.2                                          15          10
      64500                                                          -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================
*A:PE-3# show router route-table 10.0.4.0/24

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.0.4.0/24                                   Remote  BGP       00h00m54s  170
       192.168.13.1                                                 10
10.0.4.0/24                                   Remote  BGP       00h00m54s  170
       192.168.23.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 2
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

Traffic flows with destination 10.0.4.0/24 will be sprayed over the two active paths.

Add-path for family VPN-IPv4 with BGP FRR

Example topology with VPRNs shows the example topology with VPRN1 configured on the PEs in AS 64496. CE-4 exports prefix 172.31.0.0/16 to VPRN 1 on PE-1 and PE-2.

Figure 6. Example topology with VPRNs

VPRN 1 is configured on all PEs in AS 64496, but not on the RR. BGP FRR is enabled in the VPRN with the enable-bgp-vpn-backup option. The configuration of VPRN 1 is similar on all PEs; for example, for PE-1, the VPRN configuration is as follows:

# on PE-1:
configure
    router Base
        policy-options
            begin
            policy-statement "export-bgp"
                entry 10
                    from
                        protocol bgp-vpn
                    exit
                    to
                        protocol bgp
                    exit
                    action accept
                    exit
                exit
            exit
            policy-statement "import-bgp-LP200"
                default-action accept
                    local-preference 200
                exit
            exit
            commit
        exit
    exit
    service
        vprn 1 name "VPRN 1" customer 1 create
            autonomous-system 64496
            enable-bgp-vpn-backup ipv4          # BGP FRR
            interface "int-PE-1-CE-4_VPRN1" create
                address 172.16.114.1/30
                sap 1/1/3:1 create
                exit
            exit
            bgp-ipvpn
                mpls
                    auto-bind-tunnel
                        resolution any
                    exit
                    route-distinguisher 64496:1
                    vrf-target target:64496:1
                    no shutdown
                exit
            exit
            bgp
                split-horizon
                group "EBGP_1"
                    next-hop-self
                    import "import-bgp-LP200"
                    export "export-bgp"
                    peer-as 64500
                    neighbor 172.16.114.2
                    exit
                exit
            exit
            export-inactive-bgp                 # BGP best-external in VPRN
            no shutdown

The import policy sets the LP to 200 for the routes received from CE-4. The configuration on PE-2 is similar, but without import policy. Therefore, the path via PE-1 will be preferred over the path via PE-2.

The export-inactive-bgp option must be configured on PE-2, because the route for prefix 172.31.0.0/16 received by PE-2 from CE-4 is inactive, but should still be advertised as BGP VPN-IPv4 route to RR-5; see chapter BGP Best-External in a VPRN. In this example, the export-inactive-bgp option is configured on all PEs.

On the CEs, the configuration is either in the base routing instance—with additional router interfaces and BGP neighbors—or in a VPRN. In this example, the following VPRN is configured on CE-4:

# on CE-4:
configure
    router Base
        policy-options
            begin
            prefix-list "172.31.0.0/16"
                prefix 172.31.0.0/16 longer
            exit
            policy-statement "export_172.31.0.0/16"
                entry 10
                    from
                        prefix-list "172.31.0.0/16"
                    exit
                    action accept
                    exit
                exit
            exit
            commit
        exit
    exit
    service
        vprn 1 name "VPRN 1" customer 1 create
            autonomous-system 64500
            route-distinguisher 64500:1
            interface "int-CE-4-PE-1_VPRN1" create
                address 172.16.114.2/30
                sap 1/1/1:1 create
                exit
            exit
            interface "int-CE-4-PE-2_VPRN1" create
                address 172.16.124.2/30
                sap 1/1/2:1 create
                exit
            exit
            interface "test_connectedNW" create
                address 172.31.0.1/16
                loopback
            exit
            bgp
                split-horizon
                group "EBGP_1"
                    export "export_172.31.0.0/16"
                    peer-as 64496
                    neighbor 172.16.114.1
                    exit
                    neighbor 172.16.124.1
                    exit
                exit
            exit
            no shutdown

The configuration on CE-6 is similar.

For all BGP speakers in AS 64496, BGP must be configured for address family VPN-IPv4 as well as for IPv4, as follows:

# on PE-1, PE-2, PE-3:
configure 
    router Base
        bgp 
            group "IBGP" 
                family ipv4 vpn-ipv4

BGP add-path cannot be enabled in the bgp context within a VPRN. However, BGP add-path can be enabled in the base routing instance for address family VPN-IPv4. This is done on all PEs at group level with the following command:

# on all PEs:
configure 
    router Base
        bgp 
            group "IBGP" 
                add-paths 
                    vpn-ipv4 send 2 receive

In this example, BGP add-path is enabled at neighbor level on RR-5, as follows:

# on RR-5:
configure
    router Base
        bgp
            group "IBGP"
                neighbor 192.0.2.1
                    add-paths
                        vpn-ipv4 send 2 receive
                    exit
                exit
                neighbor 192.0.2.2
                    add-paths
                        vpn-ipv4 send 2 receive
                    exit
                exit
                neighbor 192.0.2.3
                    add-paths
                        vpn-ipv4 send 2 receive
                    exit
                exit

The BGP configuration for group "IBGP" on PE-1 is as follows:

*A:PE-1# configure router bgp group "IBGP" 
*A:PE-1>config>router>bgp>group# info 
----------------------------------------------
                family ipv4 vpn-ipv4
                next-hop-self
                peer-as 64496
                add-paths
                    ipv4 send 2 receive
                    vpn-ipv4 send 2 receive
                exit
                neighbor 192.0.2.5
                exit
----------------------------------------------

With add-path enabled for address family VPN-IPv4, PE-1 and PE-2 will advertise their route for prefix 172.31.0.0/16 as VPN-IPv4 route to RR-5. RR-5 will advertise both routes to its other RR clients. PE-3 receives two VPN-IPv4 routes for prefix 172.31.0.0/16, as follows:

*A:PE-3# show router bgp routes 172.31.0.0/16 vpn-ipv4
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP VPN-IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  64496:1:172.31.0.0/16                              200         None
      192.0.2.1                                          3           10
      64500                                                          524284
ub*i  64496:1:172.31.0.0/16                              100         None
192.0.2.2                                          15          10
64500                                                          524283
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

Both routes are used: the route via PE-1 is the active route and the route via PE-2 is used as a backup, as indicated by the "b" flag.

The routing table for VPRN 1 on PE-3 shows that there is a backup route for prefix 172.31.0.0/16, as indicated by "B" as follows:

*A:PE-3# show router 1 route-table 172.31.0.0/16

===============================================================================
Route Table (Service: 1)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
172.31.0.0/16 [B]                             Remote  BGP VPN   00h00m32s  170
       192.0.2.1 (tunneled)                                         10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

The active route and the alternative (backup) route are shown in the following output:

*A:PE-3# show router 1 route-table 172.31.0.0/16 alternative

===============================================================================
Route Table (Service: 1)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                   Metric
      Alt-NextHop                                                Alt-
                                                                 Metric
-------------------------------------------------------------------------------
172.31.0.0/16                                 Remote  BGP VPN   00h00m32s  170
       192.0.2.1 (tunneled)                                         10
172.31.0.0/16 (Backup)                        Remote  BGP VPN   00h00m32s  170
       192.0.2.2 (tunneled)                                         10
-------------------------------------------------------------------------------
No. of Routes: 2
Flags: n = Number of times nexthop is repeated
Backup = BGP backup route
       LFA = Loop-Free Alternate nexthop
       S = Sticky ECMP requested
===============================================================================

BGP FRR is disabled in VPRN 1 on the PEs, as follows:

# on PE-1, PE-2, PE-3:
configure 
    service 
        vprn "VPRN 1"
            no enable-bgp-vpn-backup

Add-path for family VPN-IPv4 with ECMP

The import policy is removed in VPRN 1 on PE-1 to make the cost of the paths via PE-1 and PE-2 equal, as follows:

# on PE-1:
configure
    service
        vprn "VPRN 1"
            bgp 
                group "EBGP_1"
                    no import

ECMP is enabled in VPRN 1 on all PEs, as follows:

# on PE-1, PE-2, PE-3:
configure 
    service 
        vprn "VPRN 1"
            ecmp 2

BGP multipath needs to be enabled in the base routing context, but that already happened.

With ECMP enabled, the two routes that are received on PE-3 from RR-5 are both active, as follows:

*A:PE-3# show router bgp routes 172.31.0.0/16 vpn-ipv4
===============================================================================
 BGP Router ID:192.0.2.3        AS:64496       Local AS:64496
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP VPN-IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  64496:1:172.31.0.0/16                              100         None
      192.0.2.1                                          3           10
      64500                                                          524284
u*>i  64496:1:172.31.0.0/16                              100         None
      192.0.2.2                                          15          10
      64500                                                          524283
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

ECMP is enabled with a value of two, so traffic flows in VPRN 1 on PE-3 with destination 172.31.0.0/16 are distributed over two paths: one via PE-1 and another via PE-2, as follows:

*A:PE-3# show router 1 route-table 172.31.0.0/16

===============================================================================
Route Table (Service: 1)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
172.31.0.0/16                                 Remote  BGP VPN   00h00m48s  170
       192.0.2.1 (tunneled)                                         10
172.31.0.0/16                                 Remote  BGP VPN   00h00m48s  170
       192.0.2.2 (tunneled)                                         10
-------------------------------------------------------------------------------
No. of Routes: 2
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

Conclusion

BGP add-path allows BGP speakers to advertise multiple distinct paths for the same prefix. The potential benefits of BGP add-path include reduced routing churn, faster convergence, and better load-sharing.