BGP

Border Gateway Protocol (BGP) is an inter-AS routing protocol. An Autonomous System (AS) is a network or a group of routers logically organized and controlled by common network administration. BGP enables routers to exchange network reachability information, including information about other ASs that traffic must traverse to reach other routers in other ASs.

ASs share routing information, such as routes to each destination and information about the route or AS path, with other ASs using BGP. Routing tables contain lists of known routers, reachable addresses, and associated path cost metrics for each router. BGP uses the information and path attributes to compile a network topology.

To set up BGP routing, participating routers must have BGP enabled, and be assigned to an AS, and the neighbor (peer) relationships must be specified. A router typically belongs to only one AS.

This section describes the minimal configuration necessary to set up BGP on SR Linux. This includes the following:

  • Global BGP configuration, including specifying the Autonomous System Number (ASN) of the router, as well as the router ID.

  • BGP peer group configuration, which specifies settings that are applied to BGP neighbor routers in the peer group.

  • BGP neighbor configuration, which specifies the peer group to which each BGP neighbor belongs, as well as settings specific to the neighbor, including the AS to which the router is peered.

For information about all other BGP settings, see the SR Linux online help, as well as the SR Linux Advanced Solutions Guide and the SR Linux Data Model Reference.

BGP global configuration

Global BGP configuration includes specifying the Autonomous System Number (ASN) of the router and the router ID.

Configuring an ASN

You can configure an Autonomous System Number (ASN) for a router. An ASN is a globally unique value that associates a router to a specific AS. Each router participating in BGP must have an ASN specified.

The following example configures an ASN for a router:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                autonomous-system 65002
            }
        }
    }

Configuring the router ID

The router ID, expressed like an IP address, uniquely identifies the router and indicates the origin of a packet for routing information exchanged between ASs. The router ID is configured at the BGP level.

The following example configures a router ID:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                router-id 2.2.2.2
        }
    }

Configuring a BGP peer group

As part of BGP configuration, you configure a BGP peer group. A BGP peer group is a collection of related BGP neighbors. The group name should be a descriptive name for the group.

All parameters configured for a peer group are inherited by each peer (neighbor) in the peer group, but a group parameter can be overridden for specific neighbors in the configuration of that neighbor.

The following example configures the administrative state and trace options for a BGP peer group. These settings apply to all of the BGP neighbors that are members of this group, unless specifically overridden in the neighbor configuration.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                group headquarters1 {
                    admin-state enable
                    traceoptions {
                        flag events {
                        }
                        flag graceful-restart {
                        }
                    }
                }
            }

Configuring BGP neighbors

After configuring a BGP peer group and assigning options, you add neighbors within the same AS to create internal BGP (iBGP) connections and, or neighbors in a different AS to create external BGP (eBGP) peers. All parameters configured for the peer group to which the neighbor is assigned are applied to the neighbor, but a peer group parameter can be overridden on a specific neighbor basis.

The following example configures parameters for two BGP neighbors. The peer-group parameter configures both nodes to use the settings specified for the headquarters1 peer group. The peer group settings apply unless they are specifically overridden in the neighbor configuration.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    peer-group headquarters1
                    description "default network-instance bgp neighbor to Node A"
                    peer-as 65001
                    local-as as-number 65002 {
                    }
                    multihop {
                        admin-state enable
                        maximum-hops 3
                    }
                    failure-detection {
                        enable-bfd true
                        fast-failover true
                    }
                }
                neighbor 192.168.13.2 {
                    peer-group headquarters1
                    description "default network-instance bgp neighbor to Node C"
                    peer-as 65003
                    local-as as-number 65002 {
                    }
                    failure-detection {
                        enable-bfd true
                        fast-failover true
                    }
                }
            }
        }
    }

BGP peer import and export policies

SR Linux supports BGP import policies and export policies:

  • An import policy is a sequence of match conditions and action rules that are run on certain routes received from BGP peers. If a received route is rejected by an import policy rule, then depending on the address family and BGP configuration options, the route may be discarded or it may be stored in the RIB but considered invalid and not considered during best-path selection.

  • An export policy is a sequence of match conditions and action rules that are run on routes that have been selected for advertisement to a BGP peer. If a route that would normally be advertised to a peer (RIB-OUT) is rejected by an export policy rule, then the actual advertisement of the route to the peer is blocked.

BGP import and export policy rules

You can configure BGP import or export policies at the BGP global, peer-group, and neighbor levels. The policies operate according to the following rules:

For BGP import policies:

  • If the configuration of a BGP neighbor explicitly specifies an import policy, then this is the policy used to filter inbound routes from the peer, and import policies defined at higher levels of configuration (group or BGP instance as a whole) are ignored.
  • If the configuration of a BGP neighbor does not specify an import policy, but the peer group to which it belongs does specify an import policy, then the peer-group import policy is used to filter inbound routes from the peer, and the import policy defined at the BGP instance level is ignored.
  • If the configuration of a BGP neighbor does not specify an import policy, and the peer group to which it belongs also does not specify an import policy, then the BGP-instance import-policy is used to filter inbound routes from the peer.
  • If there is no import-policy at any level of configuration that applies to a BGP neighbor, then the handling of received routes depends on the peer session type, as follows:
    • If the peer session type is iBGP, then all received routes are accepted.
    • If the peer session type is eBGP, then all received routes are rejected by default; however, this can be changed by configuring the ebgp-default-policy import-reject-all setting to false.

For BGP export policies:

  • If the configuration of a BGP neighbor explicitly specifies an export policy, then this is the policy used to filter outbound routes sent to the peer, and export policies defined at higher levels of configuration (group or BGP instance as a whole) are ignored.
  • If the configuration of a BGP neighbor does not specify an export policy, but the peer group to which it belongs does specify an export policy, then the peer-group export policy is used to filter outbound routes sent to the peer, and the export policy defined at the BGP instance level is ignored.
  • If the configuration of a BGP neighbor does not specify an export policy, and the peer group to which it belongs also does not specify an export policy, then the BGP-instance export policy is used to filter outbound routes sent to the peer.
  • If there is no export policy at any level of configuration that applies to a BGP neighbor, then the advertisement of routes in the local RIB-IN depends on the RIB-IN type and the peer session type as follows:
    • If the peer session type is iBGP, then all non-imported BGP RIB-INs are accepted and therefore eligible for advertisement.
    • If the peer session type is eBGP, then all routes are rejected by default, meaning that no routes are eligible for advertisement; however, this can be changed by configuring the ebgp-default-policy export-reject-all setting to false, in which case all non-imported BGP RIB-INs are accepted and eligible for advertisement.

AFI-SAFI policy attachment rules

The OpenConfig BGP model supports the attachment of AFI-SAFI-specific route policies to selected peers, peer-groups, or to the BGP instance as a whole. This implies, for example, that for one single peer you could have one export policy for IPv4_UNICAST routes advertised to the peer, and a different export policy for IPV6_UNICAST routes advertised to the peer. To accommodate this kind of configuration, SR Linux includes the following contexts supporting attachment of import and export policies:

  • network-instance.protocols.bgp
  • network-instance.protocols.bgp.afi-safi
  • network-instance.protocols.bgp.group
  • network-instance.protocols.bgp.group.afi-safi
  • network-instance.protocols.bgp.neighbor
  • network-instance.protocols.bgp.neighbor.afi-safi

The policy that applies to a route is based on the following order of priority:

  1. AFI-SAFI at neighbor level

  2. AFI-SAFI at group level

  3. AFI-SAFI at instance level

  4. General policy at neighbor level

  5. General policy at group level

  6. General policy at instance level

  7. Default policy

eBGP multihop

External BGP (eBGP) multihop can be used to form adjacencies when eBGP neighbors are not directly connected to each other; for example, when a non-BGP router is between the eBGP neighbors.

BGP TCP/IP packets sent toward an eBGP neighbor by default have a TTL value of 1. If the BGP TCP/IP packets need to pass through more than one router to reach their destination, the TTL decrements to 0, and the packets are dropped.

To prevent this, you can enable multihop for the eBGP neighbor and specify the maximum number of hops for BGP TCP/IP packets sent to the neighbor. This allows the eBGP neighbor to be indirectly connected by up to the specified number of hops.

When multihop is not enabled, the IP TTL for eBGP sessions is set to 1, and the IP TTL for iBGP sessions is set to 64. By enabling multihop and configuring the maximum number of hops to a neighbor, it allows an eBGP session to have multiple hops, and an iBGP session to have a single hop, if required.

If multihop is enabled and the maximum-hops parameter is configured for a BGP peer group, the settings are applied to the members of the group. If the multihop configuration for a neighbor is changed, the session with the neighbor must be disconnected and re-established for the change to take effect.

Configuring eBGP multihop

To configure eBGP multihop, you enable it for the eBGP neighbor, and specify a value for the maximum-hops parameter. Additionally, the next-hop to the neighbor must be configured so that the two systems can establish a BGP session.

Enable multihop for an eBGP neighbor

The maximum-hops parameter is set to 2, which increases the TTL for BGP TCP/IP packets sent toward the eBGP neighbor, allowing the neighbor to be indirectly connected by up to 2 hops.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    multihop {
                        admin-state enable
                        maximum-hops 2
                    }
                }
            }
        }
    }

Configure a route to the next-hop toward the eBGP neighbor

--{ * candidate shared default }--[  ]--
# info network-instance default static-routes
    network-instance default {
        static-routes {
            route 192.168.11.0/24 {
                next-hop-group static-ipv4-grp
            }
        }
    }
--{ * candidate shared default }--[  ]--
# info network-instance default next-hop-groups group static-ipv4-grp
    network-instance default {
        next-hop-groups {
            group static-ipv4-grp {
                nexthop 1 {
                    ip-address 192.168.22.22
                }
            }
        }
    }

AS path options

You can set the following options for handling the AS_PATH in received BGP routes:

  • Allow own AS – configures the router to process received routes when its own ASN appears in the AS_PATH.

  • Replace peer AS – configures the router to replace the ASN of the peer router in the AS_PATH with its own ASN.

  • Remove private AS path numbers – configures the router to either delete private AS numbers, shortening the AS path length, or replace private AS numbers with the local AS number used toward the peer, maintaining the AS path length.

Configuring allow-own-as

You can use the allow-own-as option to configure the router to process received routes when its own ASN appears in the AS_PATH. Normally, when the ASN of a router appears in the AS_PATH of received routes, it is considered a loop, and the routes are discarded. Specifically, it configures the maximum number of times the global ASN of the router can appear in any received AS_PATH before it is considered a loop and considered invalid. Default is 0.

The following example configures the router to process received routes where its own ASN appears in the AS_PATH a maximum of 1 time:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                autonomous-system 65001
                as-path-options {
                    allow-own-as 1
                }
            }
        }
    }

Configuring replace-peer-as

You can configure the router to replace the peer ASN in AS_PATH with its own ASN. Normally, two sites having the same ASN would not be able to reach each other directly because the receiving router would see its own ASN in the AS_PATH and consider it a loop. You can overcome this by configuring the router to replace the peer ASN in the AS_PATH with its own ASN. When the replace-peer-as option is set to true, the router replaces every occurrence of the peer AS number that is present in the advertised AS_PATH with the local ASN used toward the peer.

The following example configures the router to replace the ASN of the peer with its own ASN:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    replace-peer-as true
                    }
                }
            }
        }
    }

Configuring remove-private-as

You can configure how the router handles private AS numbers: either delete them, shortening the AS path length, or replace private AS numbers with the local ASN used toward the peer, which maintains the AS path length.

You can configure the router to delete or replace private AS numbers that appear before the first occurrence of a non-private ASN in the sequence of most recent ASNs in the AS path. You can also configure the router to ignore private AS numbers when they are the same as the peer ASN.

Configure the router to delete private AS numbers

The following example configures the router to delete private AS numbers (2-byte and 4-byte) from the advertised AS path toward all peers. This shortens the AS path.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode delete
                    }
                }
            }
        }
    }

The following example configures the router to replace private AS numbers with the local ASN used toward the peer. This keeps the AS path the same length.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                    }
                }
            }
        }
    }

The following example configures the router to replace only private AS numbers that appear before the first occurrence of a non-private ASN in the sequence of most recent ASNs in the AS path.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                        leading-only true
                    }
                }
            }
        }
    }

The following example configures the router to ignore private AS numbers (not replace them) when they are the same as the peer AS number.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                        ignore-peer-as true
                    }
                }
            }
        }
    }

BGP MED

The Multi-Exit Discriminator (MED) attribute is an optional attribute that can be added to routes advertised to an eBGP peer to influence the flow of inbound traffic to the AS. The MED attribute carries a 32-bit metric value. A lower metric is better than a higher metric when MED is compared by the BGP decision process.

By default, the MED attribute is compared only if the routes come from the same neighbor AS. You can optionally configure SR Linux to compare the MED value from different ASs when selecting the best route.

Configuring always-compare-med

To configure SR Linux to use MED values from different ASs in the BGP decision process (tie-break between routes for the same NLRI), set the always-compare-med option to true. By default, this option is set to false, which uses MED values in the BGP decision process only for routes from the same neighbor AS.

The following example sets the always-compare-med option to true:

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp best-path-selection
    network-instance default {
        protocols {
            bgp {
                best-path-selection {
                    always-compare-med true
                }
            }
        }
    }

BGP AIGP metric

Note: The BGP AIGP metric is supported only on 7250 IXR platforms.

The accumulated IGP (AIGP) metric is an optional non-transitive attribute that can be attached to selected routes to influence the BGP decision process to prefer BGP paths with a lower end-to-end IGP cost, even when the compared paths span more than one AS or IGP instance. AIGP is different from MED in several important ways:

  • AIGP is not intended to be transitive between completely distinct autonomous systems (only across internal AS boundaries).

  • AIGP is always compared in paths that have the attribute, whether they come from a different neighbor AS or not.

  • AIGP is more important than MED in the BGP decision process.

  • AIGP is automatically incremented every time there is a BGP next-hop change so that it can track the end-to-end IGP cost, whereas all arithmetic operations on MED attributes must be done manually (for example, using route policies).

In the SR Linux implementation, AIGP is supported only in the base router BGP instance and only for the following types of routes: IPv4 unicast, labeled IPv4 unicast, IPv6 unicast, and labeled IPv6 unicast. When AIGP is enabled for an address family, the AIGP attribute is sent to all peers, except for peers or groups configured with the block-accumulated-igp command. For the purpose of best path selection, all routes from a blocked peer are treated as though they were received without any AIGP attribute. If the AIGP attribute is received from a peer that is not configured for AIGP, or if the attribute is received in a non-supported route type, the attribute is discarded and not propagated to other peers (but it is still displayed in BGP info from state commands).

When a router receives a route with an AIGP attribute and it re-advertises the route to an AIGP-enabled peer without any change to the BGP next hop, the AIGP metric value is unchanged by the advertisement (RIB-OUT) process. But if the route is re-advertised with a new BGP next hop, the AIGP metric value is automatically incremented by the route table (or tunnel table) cost to reach the received BGP next hop.

Note: No route policy configuration is required to advertise the AIGP attribute to peers. When AIGP is enabled for an address family, the router advertises the AIGP attribute to all peers in that family (unless they are explicitly blocked).

Configuring BGP AIGP metric

To enable BGP AIGP for all routes of an address family, use the protocols bgp afi-safi best-path-selection accumulated-igp command.

Configure BGP AIGP metric

In the following example, AIGP is enabled for IPv4 unicast and labeled unicast address families. AIGP is also supported for IPv6 unicast and labeled unicast address families.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-labeled-unicast {
                    best-path-selection {
                        accumulated-igp true
                    }
                }
                afi-safi ipv4-unicast {
                    best-path-selection {
                        accumulated-igp true
                    }
                }
            }
        }
    }

Route reflection

In a standard iBGP configuration, all BGP speakers within an AS must have full BGP mesh to ensure that all externally learned routes are redistributed through the entire AS.

Configuring route reflection provides an alternative to the full BGP mesh requirement: instead of peering with all other iBGP routers in the network, each iBGP router only peers with a router configured as a route reflector.

An AS can be divided into multiple clusters, with each cluster containing at least one route reflector, which redistributes routes to the clients in the cluster. The clients within the cluster do not need to maintain a full peering mesh between each other. They only require a peering to the route reflectors in their cluster. The route reflectors must maintain a full peering mesh between all non-clients within the AS.

Configuring route reflection

To configure a route reflector, you assign it a cluster ID and specify which neighbors are clients and which are non-clients. Clients receive reflected routes, and non-clients are treated as a standard iBGP peer.

The following example configures the router to be a route reflector for two clients SRL-1 and SRL-2. The router is assigned cluster ID 0.0.0.1.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                route-reflector {
                    cluster-id 0.0.0.1
                    }
                }
                neighbor SRL-1 {
                    route-reflector {
                        cluster-id 0.0.0.1
                        client true
                    }
                neighbor SRL-2 {
                    route-reflector {
                        cluster-id 0.0.0.1
                        client true
                    }
                }
            }
        }
    }

BGP add-path

Normally if the SR Linux device receives an advertisement of an NLRI and path from a specific peer, and that peer subsequently advertises the same NLRI with different path information (a different next-hop and, or different path attributes), the new path effectively overwrites the existing path.

If BGP add-path has been negotiated with the peer, there is a different behavior: the newly advertised path is stored in the RIB-IN along with all of the paths previously advertised (and not withdrawn) by the peer.

For router A to receive multiple paths per NLRI from peer B for a particular address family, the BGP capabilities advertisement during session setup must indicate that peer B needs to send multiple paths for the address family, and router A is willing to receive multiple paths for the address family.

When the add-path receive capability for an address family has been negotiated with a peer, all advertisements and withdrawals of NLRI within that address family by that peer includes a path identifier.

  • If the combination of NLRI and path identifier in an advertisement from a peer is unique (does not match an existing route in the RIB-IN from that peer), the route is added to the RIB-IN.
  • If the combination of NLRI and path identifier in a received advertisement is the same as an existing route in the RIB-IN from the peer, the new route replaces the existing one.
  • If the combination of NLRI and path identifier in a received withdrawal matches an existing route in the RIB-IN from the peer, that route is removed from the RIB-IN.

BGP add-path is supported by BGP running in the default network-instance and BGP running in any IP-VRF network-instance.

BGP add-path is configurable per address family at the network-instance, group, and neighbor levels. Inheritance of add-path configuration from network-instance to group to neighbor is per address family. The following address families are supported:

  • IPv4 unicast

  • IPv6 unicast

  • Layer 3 VPN IPv4 unicast

  • Layer 3 VPN IPv6 unicast

  • IPv4 labeled unicast

  • IPv6 labeled unicast

  • EVPN

  • Route target

Configuring BGP add-path

SR Linux supports the following add-path options:
  • receive – Negotiate with a peer to receive multiple path advertisements from a single peer for a single NLRI belonging to the address family.
  • send – Negotiate with a peer to send multiple path advertisements to a single peer for a single NLRI belonging to the address family.
  • send-max – Send the best paths for a single NLRI, up to a configured maximum, or as many as possible until there are no more valid paths to send.
  • send-multipath – Send the used paths for a single NLRI, including all paths that are multipaths.
  • eligible-prefix-policy – Control add-path send behavior using a routing policy. This option is not supported at the group or neighbor levels.

Enable BGP add-path send for an address family

The following example enables the SR Linux device to negotiate with a BGP peer to send multiple path advertisements for a single NLRI belonging to an address family:

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 1.1.1.1
    network-instance default {
        protocols {
            bgp {
                neighbor 1.1.1.1 {
                    afi-safi ipv4-unicast {
                        add-paths {
                            send true
                        }
                    }
                }
            }
        }
    }

Send up to a maximum number of paths

The following example enables the SR Linux device to send up to 10 advertisements for a single NLRI belonging to the IPv4 unicast address family:

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp afi-safi ipv4-unicast
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-unicast {
                    add-paths {
                        send-max 10
                    }
                }
            }
        }
    }

Use a routing policy to control BGP add-path behavior

The following example configures a routing policy that matches prefixes in a prefix-set with a policy-result action of accept. The routing policy is specified in the add-paths configuration to control the BGP add-path send behavior for matching prefixes.

--{ * candidate shared default }--[  ]--
# info routing-policy
    routing-policy {
        prefix-set pset1 {
            prefix 10.3.192.0/21 mask-length-range 21..24 {
            }
            prefix 10.3.191.0/21 mask-length-range exact {
            }
        }
        policy ap1 {
            statement st1 {
                match {
                    prefix-set pset1
                }
                action {
                    policy-result accept
                }
            }
        }
    }
--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp afi-safi ipv4-unicast
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-unicast {
                    add-paths {
                        eligible-prefix-policy ap1
                    }
                }
            }
        }
    }

The routing policy referenced by eligible-prefix-policy can have the following match conditions:

  • prefix-set
  • family
  • community set

The action in the routing policy can be accept or reject. Route property modification actions in the routing policy are ignored.

Note:
  • If no routing policy is configured to control add-path send behavior, it is advertised for all prefixes for the specified address family.

  • If a routing policy is configured, but there is no match, add-path capability is advertised for the prefix according to the afi-safi configuration.

  • If the routing policy is matched, and the action is accept, add-path capability is advertised for the prefix according to the afi-safi configuration.

  • If the routing policy is matched, and the action is reject, add-path capability is not advertised for the prefix.

BGP graceful restart

BGP graceful restart allows a router whose control plane has temporarily stopped functioning because of a system failure or a software upgrade to return to service with minimal disruption to the network.

To do this, the router relies on neighbor routers, which have also been configured for graceful restart, to maintain forwarding state while the router restarts. These neighbor routers are known as helper routers. The helper routers and the restarting router continue forwarding traffic using the previously learned routing information from the restarting router. Other routers in the network are not notified about the restarting router, so network traffic is not disrupted.

When graceful restart is enabled on the SR Linux and its neighbor, the two routers exchange information about graceful restart capability, including the Address Family Identifier (AFI) and Subsequent Address Family Identifier (SAFI) of the routes supported for graceful restart.

While the router restarts, the helper router marks the routes from the restarting router as stale, but continues to use them for traffic forwarding. When the BGP session is reestablished, the restarting router indicates to the helper router that it has restarted. The helper router then sends the restarting router any BGP RIB updates, followed by an End-of-RIB (EOR) marker indicating that the updates are complete. The restarting router then makes its own updates and sends them to the helper router, followed by an EOR marker.

Graceful restart is used in conjunction with the In-Service Software Upgrade (ISSU) feature, which can be used to upgrade 7220 IXR-D2 and D3 systems while maintaining non-stop forwarding. During the ISSU, a warm reboot brings down the control and management planes while the NOS reboots, and graceful restart maintains the forwarding state in peers. You can use a tools command to validate that the SR Linux and its peers support warm reboot, including graceful restart configuration. See the SR Linux Software Installation Guide for more information.

Configuring graceful restart

You can configure graceful restart for the BGP instance. The SR Linux operates as a helper router for neighbor routers when they are restarting, assuming graceful restart is also enabled on the neighbors. Enabling graceful restart also indicates to the neighbors that they can serve as helper routers when the SR Linux itself is restarting.

When operating as a helper router, the SR Linux marks the routes from the restarting router as stale, but continues to use them for forwarding for a period of time while the neighbor router restarts. After this period expires, the SR Linux deletes the routes. The stale-routes-time parameter configures the amount of time in seconds the routes remain stale before they are deleted.

The requested-restart-time parameter configures the amount of time in seconds to wait for a graceful restart-capable neighbor to re-establish a TCP connection. After this period expires, the helper router deletes the stale routes it preserved on behalf of its neighbor routers.
--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                graceful-restart {
                    admin-state enable
                    stale-routes-time 300
                    requested-restart-time 300
                }
            }
        }
    }

Following a restart, by default the system waits 600 seconds (10 minutes) to receive EOR markers from all helper routers for all address families that were up before the restart. After this time elapses, the system assumes convergence has occurred and sends its own EOR markers to its peers. You can configure the amount of time the system waits to receive EOR markers to be from 0 to 3,600 seconds.

For example, the following configures the amount of time the system waits to receive EOR markers to 270 seconds.

--{ * candidate shared default }--[  ]--
# info system warm-reboot
    system {
        warm-reboot {
            bgp-max-wait 270
        }
    }

BGP unnumbered peering

In a typical large-scale data center using BGP, leaf and spine switches are interconnected in a Clos topology, and each device establishes a single-hop eBGP session with each of its physically connected peers. The sessions come up as eBGP because of the ASN allocation scheme; it is common practice to assign a unique ASN to every leaf switch (TOR) in a cluster and a different unique ASN to the set of spine switches to which those TORs are connected. The allocated ASNs are typically private ASNs in the range 4200000000 to 4294967294, although this is not always the case.

For this type of configuration, BGP unnumbered peering can be a useful solution. BGP unnumbered peering is the dynamic setup of one or more single-hop BGP sessions over a network segment that has no globally-unique IPv4 or IPv6 addresses. Each router connected to the network segment is assumed to have an IPv6-enabled interface to the network, and these interfaces have IPv6 link-local addresses that are typically auto-generated by each router from the interface MAC addresses.

How sessions are established using BGP unnumbered peering

The set of BGP speakers configured for BGP unnumbered peering on a network segment discover each other by sending and receiving ICMPv6 router advertisement (RA) messages.

Consider an example of Router A and Router B, which are both connected to an unnumbered interface and configured for BGP unnumbered dynamic session setup. The BGP session between the two routers is established in the following sequence:

  1. Router B sends an ICMPv6 RA message on its interface b1.

    Assuming the RA message is unsolicited, the source IP address of this message is the link-local address of interface b1 (fe80::7efe:90ff:fefc:7ad8), and the destination IP address is the all-nodes multicast address.

  2. Asynchronously, Router A sends an ICMPv6 RA message on its interface a1.

    The source IP address is the link-local address of interface a1 (fe80::7efe:90ff:fefc:7bd8), and the destination IP address is the all-nodes multicast address.

  3. Router A receives the RA message on interface a1, and the software process responsible for ICMPv6 relays the information to BGP, because in the BGP configuration, a1 is a subinterface that is configured as a dynamic neighbor interface; that is, added to the BGP dynamic-neighbors interface list.
  4. BGP checks if it already has a BGP session with fe80::7efe:90ff:fefc:7ad8.
    • If BGP already has this session and it is up, or BGP is in the process of establishing this session, then do nothing. Possibly, Router B started the same process moments before Router A.

    • If BGP does not have a session with this link-local address, then a new TCP connection is initiated toward fe80::7efe:90ff:fefc:7ad8.

  5. When the TCP connection is established, the BGP OPEN message sent by Router A encodes a local-AS and other capabilities that come from the configuration of the peer-group associated with interface a1.
  6. Router A receives a BGP OPEN message from Router B and accepts that OPEN message, proceeding to move toward the BGP established state, if the OPEN message encodes an acceptable peer AS number (in one of the allowed-peer-as ranges configured for interface a1). The address families supported by the session are based on the usual MP-BGP negotiation.

BGP dynamic-neighbors interface list

To enable dynamic peering, you add subinterfaces to the BGP dynamic-neighbors interface list in the SR Linux configuration.

When a subinterface is added to the dynamic-neighbors interface list:
  • BGP automatically accepts incoming BGP connections to the IPv6 link-local address of that subinterface, subject to the configured max-sessions limit for the subinterface.

    For the connection to be accepted, the source address must be an IPv6 link-local address (that may or may not also be a defined neighbor address), and the reported ASN of the peer must match relevant configuration. If the source address does not match a configured neighbor address, the session is set up according to the peer-group associated with the subinterface, not the peer-group associated with the dynamic-neighbors accept match-prefix entry matching the source IPv6 link-local address if a matching entry exists.

  • BGP registers for IPv6 RA messages on the subinterface. Whenever the source of one of these RA messages matches an IPv6 link-local address for which there is currently no established BGP session, the system attempts to create a BGP session to that address, as long as this does not exceed the configured max-sessions limit for the subinterface. The session is set up according to the configured peer-group associated with the subinterface.
When a BGP session is established over a subinterface in the dynamic-neighbors interface list:
  • Changes to the allowed-peer-as ranges associated with the subinterface only take effect from the next time BGP attempts to establish the sessions.
  • Non-arrival of expected ICMPv6 RA messages on the subinterface do not trigger teardown of associated sessions.
  • Existing triggers for tearing down a session apply as normal (for example, hold-timer expiration, BFD timeout, clear bgp neighbor commands, and so on).
  • If the link-local address of a dynamic peer is configured as a static neighbor address, the dynamic session is immediately torn down and replaced by the static session.

When a subinterface is deleted from the dynamic-neighbors interface list, all dynamic sessions associated with that subinterface (excluding sessions set up by static configuration of the neighbor) are torn down immediately.

A BGP session that was previously established on an unnumbered interface and subsequently torn down can only be re-established if the subinterface is configured in the dynamic-neighbors interface list and a recent ICMPv6 RA message is received.

Configuration overrides for dynamic peers on unnumbered interfaces

When a dynamic BGP session is initiated or accepted on an interface that is tied to a peer-group, most of the parameters relevant to that session come from the configuration of that peer-group, with the following exceptions:

  • multihop maximum-hops is always 1 (for both eBGP and iBGP peers).
  • transport local-address is always the link-local address of the specified interface.
  • next-hop-self is always true. The neighbor is not presumed to have reachability to off-link destinations.
  • transport passive-mode is always false. BGP always initiates a connection when informed by ICMPv6, unless it already has a connection.
  • afi-safi ipv4-unicast ipv4-unicast receive-ipv6-next-hops is always true.
  • afi-safi ipv4-unicast ipv4-unicast advertise-ipv6-next-hops and evpn advertise-ipv6-next-hops are always true.

Peer AS Validation for dynamic peers on unnumbered interfaces

When a BGP OPEN message is received from an unnumbered dynamic neighbor, the reported AS number of the peer is checked to determine if it is acceptable to allow the peering to proceed.

For a dynamic session associated with a subinterface, the peer AS is acceptable only if it matches one of the allowed-peer-as elements of the dynamic-neighbors interface list entry for the subinterface, or if the peer AS is equal to the local AS (implying an iBGP session).

Configuring BGP unnumbered peering

To configure BGP unnumbered peering, you add subinterfaces to the BGP dynamic-neighbors interface list, and specify the peer autonomous system numbers from which incoming TCP connections to the BGP well-known port are accepted.

The following example adds a subinterface to the BGP dynamic-neighbors interface list.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp dynamic-neighbors interface ethernet-1/1.1
    network-instance default {
        protocols {
            bgp {
                dynamic-neighbors {
                    interface ethernet-1/1.1 {
                        peer-group bgp_peer_group_0
                        allowed-peer-as [
                            4294967200
                        ]
                    }
                }
            }
        }
    }

In this example, subinterface ethernet-1/1.1 is added to the BGP dynamic-neighbors interface list. This subinterface must be enabled for IPv6 and configured to accept and send IPv6 RA messages. It does not require any IPv4 addresses or global-unicast IPv6 addresses.

Incoming TCP connections to port 179 received on this subinterface that are sourced from an IPv6 link-local address and destined for the IPv6 link local address of this subinterface are automatically accepted. IPv6 RA messages received on this subinterface automatically trigger BGP session setup toward the sender of these messages, if there is not already an established BGP session.

Peer group bgp_peer_group_0 is associated with dynamic BGP neighbors on this subinterface. Parameters configured for this peer-group are used for establishing the dynamic BGP session, with the exceptions described in Configuration overrides for dynamic peers on unnumbered interfaces.

ASN 4294967200 is configured as an allowed peer AS for dynamic BGP neighbors on this subinterface. If the BGP OPEN message from a peer on this subinterface contains a MyAS number that is not an allowed peer AS, then a NOTIFICATION is sent to the peer with the indication Bad Peer AS.

Prefix-limit for BGP peers

SR Linux places a limit on the number of IPv4, IPv6, or EVPN route prefixes that can be received from a peer or from individual members of a peer group. When this prefix-limit is exceeded, SR Linux tears down the BGP session with the peer, then re-establishes the session.

You can configure the following settings for the prefix-limit:

  • max-received-routes

    By default, the prefix-limit that triggers a BGP session teardown is 4294967295 routes, which is the maximum number of routes that can be received from the peer (counting routes accepted and rejected by import policy). You can configure a different prefix-limit by setting a value for max-received-routes.

  • prefix-limit-restart-timer

    By default, after a BGP session is torn down because the prefix limit was exceeded, the BGP session is re-established immediately. You can configure the number of seconds the system waits before re-establishing the session by setting a value for prefix-limit-restart-timer.

  • prevent-teardown

    You can prevent the BGP session from being torn down when the prefix-limit is exceeded by setting prevent-teardown to true.

  • warning-threshold-pct

    You can set a warning threshold for the prefix-limit. When the number of routes received from the peer (counting routes accepted and rejected by import policy) reaches a specified percentage of the max-received-routes setting, BGP raises a warning log event. The default threshold is 90%.

When upgrading from a release earlier than 23.3.1 to Release 23.3.1 or later, the upgrade script checks the configured max-received-routes setting for IPv4 and IPv6 routes. If the configured max-received-routes setting is equal to 4295967295 for IPv4 or IPv6 routes, then prevent-teardown for IPv4 or IPv6 routes is set to true.

Configuring the prefix-limit for BGP peers

To configure the prefix-limit, you can set the maximum number of routes from a peer, number of seconds the system waits to re-establish a session following a teardown, and disable the prefix-limit for a peer.

The commands to set maximum number of routes from a peer and disable the prefix-limit can be applied to IPv4 and IPv6 routes. The settings can be applied to a specific peer or to a peer group. If there is no setting for a specific peer, the setting for the peer group applies. If there is no setting for the peer and peer group, the system default applies.

Configure maximum number of routes from a peer

The following example configures the maximum number of IPv4 routes that can be received from a peer.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 afi-safi ipv4-unicast ipv4-unicast 
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    afi-safi ipv4-unicast {
                        ipv4-unicast {
                            prefix-limit {
                                max-received-routes 30000000
                            }
                        }
                    }
                }
            }
        }
    }

If max-received-routes is not configured for the peer, the max-received-routes setting for the peer group applies. If max-received-routes is not configured for the peer group, the system default maximum of 4294967295 routes applies.

Configure prefix-limit restart timer

The following example sets the number of seconds the system waits to re-establish a BGP session with a peer after the session was torn down because the max-received-routes value was exceeded.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 timers
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    timers {
                        prefix-limit-restart-timer 60
                    }
                }
            }
        }
    }

If the prefix-limit-restart-timer is not configured for the peer, the prefix-limit-restart-timer setting for the peer group applies. If the prefix-limit-restart-timer is not configured for the peer group, the BGP session with the peer is re-established immediately after teardown (that is, prefix-limit-restart-timer = 0 seconds).

Disable the prefix-limit

The following example disables the prefix-limit for IPv4 routes received from a peer, so that the BGP session is not torn down if the maximum number of IPv4 routes received from the peer is exceeded.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 afi-safi ipv4-unicast ipv4-unicast   
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    afi-safi ipv4-unicast {
                        ipv4-unicast {
                            prefix-limit {
                                prevent-teardown true
                            }
                        }
                    }
                }
            }
        }
    }

RT constrained route advertisement

RT constrained route advertisement (RT-constrain) is a mechanism that allows a BGP router to advertise route target (RT) membership information to its BGP peers to indicate interest in receiving only BGP routes tagged with specific RT extended communities. Upon receiving this information, BGP peers restrict the advertised BGP routes to only those requested routes, which can minimize control-plane load for protocol traffic and RIB memory.

The RT membership information is carried in a special type of MP-BGP route called an RTC route; the associated AFI is 1 and the SAFI is 132. For two routers to exchange RT membership NLRI, they must advertise the corresponding AFI/SAFI to each other during capability negotiation. The use of MP-BGP means RT membership NLRI is propagated, loop-free, within an AS and between ASes using well-known BGP route selection and advertisement rules.
Note:

Extended community-based ORF can also be used for RT-based route filtering, but RT-constrain has distinct advantages over extended community-based ORF: RT-constrain is more widely supported, simpler to configure, and its distribution scope is not limited to a direct peer.

RT-constrain on SR Linux for various router types

This section describes how RT-constrain operates on SR Linux for PE, route reflector, and ASBR routers.

PE routers

A PE router originates RT membership NLRI to its peers (often route reflectors) to prevent these peers from sending unnecessary VPN routes to the PE. Usually a PE originates one RTC route for each import route-target associated with a local VPRN service or IP-VRF network-instance. An RT is an extended community with type subcode value 0x02 and type code value 0x00, 0x01, or 0x02.

RTC routes originated by a PE are by default automatically advertised to all RTC peers, without the need for an export policy to accept them. Each RTC route has a prefix (carried in the MP_REACH_NLRI and MP_UNREACH_NLRI attributes) and path attributes (for example, ORIGIN, AS_PATH).

The prefix length (in bits) of an RTC route can be 0 (for the default RTC route), 32, or a number in the range 48 to 96. The prefix value is the concatenation of the origin AS (a 4-byte value representing the 2- or 4-octet AS of the originating router, as configured under network-instance.protocols.bgp.autonomous-system) and 0 or 16-64 bits of an RT extended community.

This NLRI format allows RTs originated by the same AS and having the same N most significant bits to be advertised in a single RTC route with prefix length 32+N (N >= 16) but SR Linux implementation does not make use of this flexibility when originating RTC routes; only the default RTC route with a prefix length of 0, or fully-specified RTC routes with a prefix length of 96, are advertised to RTC peers.

Route reflectors

A route reflector (RR) propagates RTC routes according to a special set of rules outlined below. An RR typically advertises the default RTC route to each of its clients so that it receives all VPN routes belonging to the cluster. This is achieved by enabling (setting to true) afi-safi.route-target.send-default-route in the group or neighbor configuration context. The default RTC route is a special type of RTC route encoded in one of the following ways:

  • prefix-length = 0 (SR Linux routers always generate default RTC routes with this encoding)
  • prefix-length = 32, encoding only the origin AS value
  • prefix-length = 48, encoding the origin AS value, plus 16 bits of an RT type

Sending the default RTC route to a peer conveys a request to receive all VPN routes (regardless of RT extended community) from that peer. The advertisement of the default RTC route to a peer does not suppress other more-specific RTC routes from being sent to that peer.

ASBRs

An ASBR (for example, a model-B ASBR or model-C route reflector) propagates route target membership NLRI to eBGP peers in other autonomous systems to limit received routes to only those needed for services in the local AS and those to be propagated through the local AS. When the RTC route propagation path includes multiple ASNs, SR Linux routers choose only the single best path for reverse advertisement of VPN routes; advertisement of VPN routes using RTC multipaths is not supported.

Best-path selection and RTC route re-advertisement

If multiple RTC routes are received for the same prefix, then standard BGP best-path selection procedures determine the best of these routes. BGP does not check for reachability of the BGP next-hop of RTC routes, so this does not factor into best-path selection.

The best RTC route per prefix is re-advertised to RTC peers based on the following rules:

  • The best path for a default RTC route (prefix-length 0, origin AS only with prefix-length 32 or origin AS plus 16 bits of an RT type with prefix-length 48) is never propagated to another peer.

  • A PE with only IBGP RTC peers that is neither a route reflector or an ASBR does not re-advertise the best RTC route to any RTC peer due to standard IBGP split horizon rules.

  • A route reflector that receives its best RTC route for a prefix from a client peer re-advertises that route (subject to export policies) to all of its client and non-client IBGP peers (including the originator), per standard RR operation. When the route is re-advertised to client peers the RR should (i) set the ORIGINATOR_ID to its own router ID and (ii) modify the NEXT_HOP to be its local address for the sessions (e.g. system IP).

  • A route reflector that receives its best RTC route for a prefix from a non-client peer re-advertises that route (subject to export policies) to all of its client peers, per standard RR operation. Normally no route is advertised to non-client peers in this scenario, but if the RR has a non-best path for the prefix from any of its clients it should advertise the best of the client-advertised paths to all non-client peers. No ORIGINATOR_ID or NEXT_HOP manipulation is required in this case.

  • An ASBR which is neither a PE nor a route reflector that receives its best RTC route for a prefix from an IBGP peer re-advertises that route (subject to export policies) to its EBGP peers. It modifies the NEXT_HOP and AS_PATH of the re-advertised route per standard BGP rules. No aggregation of RTC routes is supported.

  • An ASBR that is neither a PE nor a route reflector that receives its best RTC route for a prefix from an EBGP peer re-advertises that route (subject to export policies) to its EBGP and IBGP peers. When re-advertised routes are sent to EBGP peers the ABSR modifies the NEXT_HOP and AS_PATH per standard BGP rules. No aggregation of RTC routes is supported.

Using RTC routes to filter advertised routes

When RT-constrain is configured on a session that also supports VPN address families using route targets (for example, VPN-IPv4, VPN-IPv6, MVPN, EVPN), advertisement of the VPN routes is affected as follows:

  • When the session comes up, the advertisement of all VPN routes is delayed until the initial set of RTC routes has been received from the peer (that is, all RTC routes in the peer’s RIB-OUT); this is the waiting state. The waiting state ends when an End-of-RIB marker for AFI/SAFI=1/132 is received from the peer, or a certain amount of time has elapsed since the session was established (this amount of time is hard-coded to 60 seconds and applies when the peer does not support sending the End-of-RIB marker). When the waiting state ends, VPN routes are sent to the peer based on received RTC routes (see below), and the session transitions to the ready state.
  • When the session is in the ready state, received RTC routes are acted upon immediately. SR Linux does not expect and wait for a ROUTE REFRESH message from the peer.

    If S1 is the set of routes previously advertised to the peer, and S2 is the set of routes to be advertised based on the most recent received RTC routes, then:

    • the set of routes in S1, but not in S2, are withdrawn immediately (subject to MRAI)
    • the set of routes in S2, but not in S1, are advertised immediately (subject to MRAI)
  • If a default RTC route (best or non-best) is received from an eBGP or iBGP peer, the VPN routes advertised to the peer is the set of VPN routes in the LOC-RIB that meet all of the following conditions:

    • are eligible for advertisement to the eBGP or iBGP peer per BGP route advertisement rules
    • have not been rejected by manually configured export policies
    • have not been advertised to the peer
  • If an RTC route for a prefix (origin-AS = A1, RT = A2/n, n > 48), best or non-best, is received from an iBGP peer in autonomous system A1, the VPN routes advertised to the iBGP peer is the set of VPN routes in the LOC_RIB that meet all of the following conditions:

    • are eligible for advertisement to the iBGP peer per BGP route advertisement rules
    • have not been rejected by manually configured export policies
    • carry at least one route target extended community with value A2 in the n most-significant bits
    • have not been advertised to the peer
  • If the best RTC route for a prefix (origin-AS = A1, RT = A2/n, n > 48) is received from an iBGP peer in autonomous system B, the VPN routes advertised to the iBGP peer is the set of VPN routes in the LOC-RIB that meet all of the following conditions:

    • are eligible for advertisement to the iBGP peer per BGP route advertisement rules
    • have not been rejected by manually configured export policies
    • carry at least one route target extended community with value A2 in the n most-significant bits
    • have not been advertised to the peer
  • If the best RTC route for a prefix (origin-AS = A1, RT = A2/n, n > 48) is received from an EBGP peer, the VPN routes advertised to the EBGP peer is the set of VPN routes in the LOC-RIB that meet all of the following conditions:

    • are eligible for advertisement to the EBGP peer per BGP route advertisement rules
    • have not been rejected by manually configured export policies
    • carry at least one RT extended community with value A2 in the n most-significant bits
    • have not been advertised to the peer

BGP RIB YANG model for RTC Routes

The following table lists the information available in the SR Linux BGP RIB YANG model for RTC routes in the RIB-IN and RIB-OUT contexts.

Table 1. RTC route information available in the SR Linux BGP RIB YANG model
bgp-rib.afi-safi.route-target.rib-in-out.rib-in-pre.routes Contains the full set of RTC routes received from all peers.
bgp-rib.afi-safi.route-target.rib-in-out.rib-in-post.routes Contains the full set of RTC routes received from all peers, after import policy modification.
bgp-rib.afi-safi.route-target.rib-in-out.rib-out-post.routes Contains the full set of RTC routes advertised to each peer, after export policy modification.

BGP configuration management

Managing the BGP configuration on SR Linux can include the following tasks:

  • Modifying an AS number
  • Deleting a BGP neighbor
  • Deleting a BGP peer group
  • Resetting BGP peer connections

Modifying an ASN

You can modify the ASN on the router, but the new ASN does not take effect until the BGP instance is restarted, either by administratively disabling/enabling the BGP instance, or by rebooting the system with the new configuration.

--{ * candidate shared default }--[ network-instance default ]--
# protocols bgp autonomous-system 95002
# protocols bgp admin-state disable
# protocols bgp admin-state enable

All established BGP sessions are taken down when the BGP instance is disabled.

Deleting a BGP neighbor

Use the delete command to delete a BGP neighbor from the configuration.

--{ * candidate shared default }--[ network-instance default ]--
# delete protocols bgp neighbor 192.168.11.1

Deleting a BGP peer group

Use the delete command to delete the settings for a BGP peer group from the configuration.

--{ * candidate shared default }--[ network-instance default ]--
# delete protocols bgp group headquarters1

Resetting BGP peer connections

To refresh the connections between BGP neighbors, you can issue a hard or soft reset. A hard-reset tears down the TCP connections and returns to IDLE state. A soft-reset sends route-refresh messages to each peer. The hard or soft reset can be issued to a specific peer, to peers in a specific peer-group, or to peers with a specific ASN.

Issue a hard reset

The following command hard-resets the connections to the BGP neighbors in a peer group that have a specified ASN. The hard reset applies both to configured peers and dynamic peers.

# tools network-instance default protocols bgp group headquarters1 reset-peer peer-as 95002
/network-instance[name=default]/protocols/bgp/group[group-name=headquarters1]:
    Successfully executed the tools clear command.

Issue a soft reset

The following command soft-resets the connection to BGP neighbors that have a specified ASN. The soft reset applies both to configured peers and dynamic peers.

# tools network-instance default protocols bgp soft-clear peer-as 95002
/network-instance[name=default]/protocols/bgp:
    Successfully executed the tools clear command.

BGP shortcuts

Note: BGP shortcuts is supported on 7730 SXR and 7250 IXR platforms.

With BGP shortcuts, SR Linux can include LDP LSPs, segment routing (SR-ISIS) tunnels, or TE Policy SR-MPLS tunnels in the BGP algorithm calculations. In this case, tunnels operate as logical interfaces directly connected to remote nodes in the network. Because the BGP algorithm treats the tunnels in the same way as a physical interface (being a potential output interface), the algorithm can select a destination node together with an output tunnel to resolve the next-hop, using the tunnel as a shortcut through the network to the destination. With BGP shortcuts enabled, next-hop resolution determines whether to use a local interface or a tunnel to resolve the BGP next-hop.

Tunnel resolution mode

As part of the configuration for BGP shortcuts, you must define the tunnel-resolution mode (prefer/required/disabled). This mode determines the order of preference and fallback of using tunnels in the tunnel table to resolve the next-hop instead of using routes in the FIB.

Configuring BGP shortcuts over segment routing

  1. In the default network instance, define the tunnel-resolution mode for the BGP protocol.
    This setting determines the order of preference and the fallback when using tunnels in the tunnel table instead of routes in the FIB. Available options are as follows:
    • require

      requires tunnel table lookup only

    • prefer

      prefers tunnel table lookup over FIB lookup

    • disabled (default)

      performs FIB lookup only

  2. Set the allowed tunnel types for next-hop resolution.

Configure IPv4 BGP shortcuts

The following example shows the BGP next-hop resolution configuration to allow IPv4 SR-ISIS tunnels, with the tunnel mode set to prefer.

--{ * candidate shared default }--[ ]--
# info network-instance default protocols bgp afi-safi ipv4-unicast ipv4-unicast next-hop-resolution ipv4-next-hops tunnel-resolution
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-unicast {
                    ipv4-unicast {
                        next-hop-resolution {
                            ipv4-next-hops {
                                tunnel-resolution {
                                    mode prefer
                                    allowed-tunnel-types [
                                        sr-isis
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    }

Configure IPv6 BGP shortcuts

The following example shows the BGP next-hop resolution configuration to allow IPv6 SR-ISIS tunnels, with the tunnel mode set to prefer.

--{ * candidate shared default }--[ ]--
# info network-instance default protocols bgp afi-safi ipv6-unicast ipv6-unicast next-hop-resolution ipv6-next-hops tunnel-resolution
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv6-unicast {
                    ipv6-unicast {
                        next-hop-resolution {
                            ipv6-next-hops {
                                tunnel-resolution {
                                    mode prefer
                                    allowed-tunnel-types [
                                        sr-isis
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    }

BGP TCP MSS

BGP uses TCP transport, and BGP messages are carried as TCP segments. SR Linux allows you to control the Maximum Segment Size (MSS) for each TCP segment based on the Path MTU discovery settings.

Path MTU discovery can be enabled or disabled per network instance in SR Linux. The default is enabled.

Within the BGP hierarchy, path MTU discovery can be enabled and disabled at different configuration levels. The supported configuration paths are:
  • network-instance.protocols.bgp.transport.mtu-discovery
  • network-instance.protocols.bgp.group.transport.mtu-discovery
  • network-instance.protocols.bgp.neighbor.transport.mtu-discovery

BGP path MTU discovery by default inherits the value from the network instance for all BGP sessions. It can be overruled by the above config. When an ICMP fragmentation-needed message is received and BGP path MTU discovery is disabled, the system reduces the MTU for the BGP session according to the ICMP message, subject to the lower bound configured under the system-level min-path-mtu.

--{ * candidate shared default }--[ ]--
# info network-instance default 
    network-instance default {
        mtu {
            path-mtu-discovery true
        }
    } 
--{ * candidate shared default }--[ ]--
# info system mtu 
    system {
        mtu {
            min-path-mtu 552
        }
    }

Configuring BGP TCP MSS

The maximum size of each TCP segment is controlled by configuring the TCP MSS (tcp-mss) value.

SR Linux supports configuring TCP MSS at BGP instance, peer group, and neighbor configuration levels. The supported range for the tcp-mss value is 536-9446 bytes, and the default value is 1024 bytes.

The value of tcp-mss gets inherited down the configuration levels within the BGP hierarchy. If no tcp-mss is configured for a BGP neighbor, the tcp-mss value is taken from the BGP peer group, if it is configured there, or else is taken from the BGP instance. The default BGP instance tcp-mss value is used if neither the BGP peer group or the neighbor has a configured tcp-mss.

If the configured or inherited tcp-mss value is higher than the BGP path MTU value, the tcp-mss value is ignored, and the BGP path MTU value is used as the operational TCP MSS.

Configuring BGP instance tcp-mss

The following example configures the BGP instance tcp-mss value.

info from state network-instance default protocols bgp trans
port tcp-mss
    network-instance default {
        protocols {
            bgp {
                transport {
                    tcp-mss 1024
                }
            }
        }
    }

Configuring BGP peer group tcp-mss

The following example configures the BGP peer group tcp-mss.

info from state network-instance default protocols bgp group trans
port tcp-mss
    network-instance default {
        protocols {
            bgp {
                group test {
                    transport {
                        tcp-mss 1024
                    }
                }
            }
        }
    }

Configuring BGP neighbor tcp-mss

The following example configures the BGP neighbor tcp-mss .

info from state network-instance default protocols bgp  neighbor 1.1.1.1 transport tcp-mss

     network-instance default {
        protocols {
            bgp {
                neighbor 192.168.0.1 {
                    transport {
                        tcp-mss 1012
                    }
                }
            }
        }
    }

If the configured or inherited tcp-mss value is higher than the operational path MTU value, the tcp-mss value is ignored and the path MTU value is used as the operational TCP MSS.

Error handling for BGP update messages

BGP update messages are used to transfer routing information between BGP peers. Errors in some BGP update messages are considered critical; for example, if the Network Layer Reachability Information (NLRI) cannot be extracted and parsed from an update message, it is a critical error. Errors in other BGP update messages are considered non-critical; for example, errors such as incorrect attribute flag settings, missing mandatory path attributes, incorrect next-hop length or format, and so on, are non-critical errors.

In SR Linux, critical errors in BGP update messages trigger a session reset. Non-critical errors are handled using the treat-as-withdraw or attribute-discard approaches to error handling. This error-handling behavior for BGP update messages is not configurable in SR Linux.

BGP multipath

BGP multipath is the ability to install a BGP route into the FIB so that the ECMP algorithm load-balances traffic across multiple BGP next-hops that come from different multipath-eligible RIB-INs for the same prefix or NLRI address family.

In a network-instance, you can enable BGP multipath for an address family and specify the maximum number of BGP ECMP next-hops for BGP routes that have an NLRI belonging to the address family.

Configuring BGP multipath

To configure BGP multipath, set the allow-multiple-as parameter to true. When you do this, BGP is allowed to build a multipath set using BGP routes with a different neighbor AS (the most recent AS in the AS_PATH).

When allow-multiple-as is set to false (the default), BGP is only allowed to use non-best paths for ECMP if they meet the multipath criteria, and they have the same neighbor AS as the best path.

The maximum-paths parameter configures the maximum number of BGP ECMP next-hops for BGP routes with an NLRI belonging to the specified address family. Note the following:

Note:
  • When a BGP prefix is covered by a resilient-hash-prefix entry, the maximum number of BGP next-hops used for load balancing is controlled by the network-instance ip-load-balancing resilient-hash-prefix <ip-prefix> max-paths value.

  • When BGP is resolved by an unweighted, non-resilient-hash IGP route, the maximum number of paths towards the BGP next-hop is controlled by the IGP configuration; for example, the IS-IS max-ecmp-paths value.

  • When BGP is resolved by a weighted, non-resilient-hash IGP route, the maximum number of paths towards the BGP next-hop is controlled by the IGP configuration; for example, the IS-IS max-ecmp-hash-buckets-per-next-hop-group value.

  • When BGP is resolved by a static, non-resilient-hash route, the maximum number of paths towards the BGP next-hop is controlled by the static NHG configuration.

Enable BGP multipath

The following example enables BGP multipath and specifies the maximum number of BGP ECMP next-hops for BGP routes with an NLRI belonging to the ipv4-unicast address family:

--{ * candidate shared default }--[  ]--
# info network-instance n1 protocols bgp afi-safi ipv4-unicast
    network-instance n1 {
        protocols {
            bgp {
                afi-safi ipv4-unicast {
                    multipath {
                        allow-multiple-as true
                        maximum-paths 10
                    }
                }
            }
        }
    }

BGP FRR

Note: In the current release, BGP FRR is supported only on 7730 SXR and 7250 IXR platforms and only with the BGP-LU address families (IPv4 and IPv6).

BGP fast reroute (FRR) combines indirection techniques in the forwarding plane and pre-computation of BGP backup paths in the control plane to support fast reroute of BGP traffic around unreachable or failed BGP next-hops. BGP FRR serves an important role in ensuring high availability of services that rely on BGP for transport.

When BGP FRR is enabled for routes of a specific address family, BGP attemps to program a backup path for every destination that has only a single primary next-hop. Because ECMP routes toward a destination already have path resiliency, no backup path is programmed for those routes. The FRR backup path is the best BGP route that passes through a different next-hop than the primary path. Backup paths are pre-programmed into the FIB to ensure they are ready for switchover immediately after the primary path fails.

The switchover to the backup path can be triggered by multiple events, including:

  • IGP topology change that makes the BGP next-hop of the primary path unreachable
  • BFD session failure that implies the BGP next-hop of the primary path is unreachable

BGP FRR also uses indirection techniques in datapath programming to allow a single update from the control plane to trigger the failover for multiple prefixes at once.

Configuring BGP FRR

To configure BGP FRR, use the backup-paths command to install a backup path for every NLRI in the address family when a suitable one exists. You can also optionally configure BGP FRR together with BGP add-path.

Configure BGP FRR

The following example configures BGP FRR for IPv4 BGP labeled-unicast addresses.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp afi-safi ipv4-labeled-unicast ipv4-labeled-unicast backup-paths
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-labeled-unicast {
                    ipv4-labeled-unicast {
                        backup-paths {
                            install true
                        }
                    }
                }
            }
        }
    }