BGP

Border Gateway Protocol (BGP) is an inter-AS routing protocol. An AS (autonomous system) is a network or a group of routers logically organized and controlled by common network administration. BGP enables routers to exchange network reachability information, including information about other ASs that traffic must traverse to reach other routers in other ASs.

ASs share routing information, such as routes to each destination and information about the route or AS path, with other ASs using BGP. Routing tables contain lists of known routers, reachable addresses, and associated path cost metrics for each router. BGP uses the information and path attributes to compile a network topology.

To set up BGP routing, participating routers must have BGP enabled, and be assigned to an AS, and the neighbor (peer) relationships must be specified. A router typically belongs to only one AS.

This section describes the minimal configuration necessary to set up BGP on SR Linux. This includes the following:

  • Global BGP configuration, including specifying the autonomous system number (ASN) of the router, as well as the router ID.

  • BGP peer group configuration, which specifies settings that are applied to BGP neighbor routers in the peer group.

  • BGP neighbor configuration, which specifies the peer group to which each BGP neighbor belongs, as well as settings specific to the neighbor, including the AS to which the router is peered.

For information about all other BGP settings, see the SR Linux online help, as well as the SR Linux Advanced Solutions Guide and the SR Linux Data Model Reference.

BGP global configuration

Global BGP configuration includes specifying the autonomous system number (ASN) of the router and the router ID.

Configuring an ASN

You can configure an Autonomous System Number (ASN) for a router. An ASN is a globally unique value that associates a router to a specific AS. Each router participating in BGP must have an ASN specified.

The following example configures an ASN for a router:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                autonomous-system 65002
            }
        }
    }

Configuring the router ID

The router ID, expressed like an IP address, uniquely identifies the router and indicates the origin of a packet for routing information exchanged between autonomous systems. The router ID is configured at the BGP level.

The following example configures a router ID:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                router-id 2.2.2.2
        }
    }

Configuring a BGP peer group

You can configure a BGP peer group. A BGP peer group is a collection of related BGP neighbors. The group name should be a descriptive name for the group.

All parameters configured for a peer group are inherited by each peer (neighbor) in the peer group, but a group parameter can be overridden for specific neighbors in the configuration of that neighbor.

The following example configures the administrative state and trace options for a BGP peer group. These settings apply to all of the BGP neighbors that are members of this group, unless specifically overridden in the neighbor configuration.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                group headquarters1 {
                    admin-state enable
                    traceoptions {
                        flag events {
                        }
                        flag graceful-restart {
                        }
                    }
                }
            }

Configuring BGP neighbors

After configuring a BGP group name and assigning options, you can add neighbors within the same autonomous system to create internal BGP (iBGP) connections and, or neighbors in different autonomous systems to create external BGP (eBGP) peers. All parameters configured for the peer group to which the neighbor is assigned are applied to the neighbor, but a group parameter can be overridden on a specific neighbor basis.

The following example configures parameters for two BGP neighbors. The peer-group parameter configures both nodes to use the settings specified for the headquarters1 group. The group settings apply unless they are specifically overridden in the neighbor configuration.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    peer-group headquarters1
                    description "default network-instance bgp neighbor to Node A"
                    peer-as 65001
                    local-as as-number 65002 {
                    }
                    multihop {
                        admin-state enable
                        maximum-hops 3
                    }
                    failure-detection {
                        enable-bfd true
                        fast-failover true
                    }
                }
                neighbor 192.168.13.2 {
                    peer-group headquarters1
                    description "default network-instance bgp neighbor to Node C"
                    peer-as 65003
                    local-as as-number 65002 {
                    }
                    failure-detection {
                        enable-bfd true
                        fast-failover true
                    }
                }
            }
        }
    }

BGP peer import and export policies

SR Linux supports BGP import policies and export policies:

  • An import policy is sequence of match conditions and action rules that are run on certain routes received from BGP peers. If a received route is rejected by an import policy rule, then depending on the address family and BGP configuration options, the route may be discarded or it may be stored in the RIB but considered invalid and not considered during best-path selection.

  • An export policy is a sequence of match conditions and action rules that are run on routes that have been selected for advertisement to a BGP peer. If a route that would normally be advertised to a peer (RIB-OUT) is rejected by an export policy rule, then the actual advertisement of the route to the peer is blocked.

BGP import and export policy rules

You can configure BGP import or export policies at the BGP global, peer-group, and neighbor levels. The policies operate according to the following rules:

For BGP import policies:

  • If the configuration of a BGP neighbor explicitly specifies an import policy, then this is the policy used to filter inbound routes from the peer, and import policies defined at higher levels of configuration (group or BGP instance as a whole) are ignored.
  • If the configuration of a BGP neighbor does not specify an import policy, but the peer group to which it belongs does specify an import policy, then the peer-group import policy is used to filter inbound routes from the peer, and the import policy defined at the BGP instance level is ignored.
  • If the configuration of a BGP neighbor does not specify an import policy, and the peer group to which it belongs also does not specify an import policy, then the BGP-instance import-policy is used to filter inbound routes from the peer.
  • If there is no import-policy at any level of configuration that applies to a BGP neighbor, then the handling of received routes depends on the peer session type, as follows:
    • If the peer session type is iBGP, then all received routes are accepted.
    • If the peer session type is eBGP, then all received routes are rejected by default; however, this can be changed by configuring the ebgp-default-policy import-reject-all setting to false.

For BGP export policies:

  • If the configuration of a BGP neighbor explicitly specifies an export policy, then this is the policy used to filter outbound routes sent to the peer, and export policies defined at higher levels of configuration (group or BGP instance as a whole) are ignored.
  • If the configuration of a BGP neighbor does not specify an export policy, but the peer group to which it belongs does specify an export policy, then the peer-group export policy is used to filter outbound routes sent to the peer, and the export policy defined at the BGP instance level is ignored.
  • If the configuration of a BGP neighbor does not specify an export policy, and the peer group to which it belongs also does not specify an export policy, then the BGP-instance export policy is used to filter outbound routes sent to the peer.
  • If there is no export policy at any level of configuration that applies to a BGP neighbor, then the advertisement of routes in the local RIB-IN depends on the RIB-IN type and the peer session type as follows:
    • If the peer session type is iBGP, then all non-imported BGP RIB-INs are accepted and therefore eligible for advertisement.
    • If the peer session type is eBGP, then all routes are rejected by default, meaning that no routes are eligible for advertisement; however, this can be changed by configuring the ebgp-default-policy export-reject-all setting to false, in which case all non-imported BGP RIB-INs are accepted and eligible for advertisement.

AFI-SAFI policy attachment rules

The OpenConfig BGP model supports the attachment of AFI-SAFI-specific route policies to selected peers, peer-groups, or to the BGP instance as a whole. This implies, for example, that for one single peer you could have one export policy for IPv4_UNICAST routes advertised to the peer, and a different export policy for IPV6_UNICAST routes advertised to the peer. To accommodate this kind of configuration, SR Linux includes the following contexts supporting attachment of import and export policies:

  • network-instance.protocols.bgp
  • network-instance.protocols.bgp.afi-safi
  • network-instance.protocols.bgp.group
  • network-instance.protocols.bgp.group.afi-safi
  • network-instance.protocols.bgp.neighbor
  • network-instance.protocols.bgp.neighbor.afi-safi

The policy that applies to a route is based on the following order of priority:

  1. AFI-SAFI at neighbor level

  2. AFI-SAFI at group level

  3. AFI-SAFI at instance level

  4. General policy at neighbor level

  5. General policy at group level

  6. General policy at instance level

  7. Default policy

eBGP multihop

External BGP (eBGP) multihop can be used to form adjacencies when eBGP neighbors are not directly connected to each other; for example, when a non-BGP router is between the eBGP neighbors.

BGP TCP/IP packets sent toward an eBGP neighbor by default have a TTL value of 1. If the BGP TCP/IP packets need to pass through more than one router to reach their destination, the TTL decrements to 0, and the packets are dropped.

To prevent this, you can enable multihop for the eBGP neighbor and specify the maximum number of hops for BGP TCP/IP packets sent to the neighbor. This allows the eBGP neighbor to be indirectly connected by up to the specified number of hops.

When multihop is not enabled, the IP TTL for eBGP sessions is set to 1, and the IP TTL for iBGP sessions is set to 64. By enabling multihop and configuring the maximum number of hops to a neighbor, it allows an eBGP session to have multiple hops, and an iBGP session to have a single hop, if required.

If multihop is enabled and the maximum-hops parameter is configured for a BGP peer group, the settings are applied to the members of the group. If the multihop configuration for a neighbor is changed, the session with the neighbor must be disconnected and re-established for the change to take effect.

Configuring eBGP multihop

To configure eBGP multihop, you enable it for the eBGP neighbor, and specify a value for the maximum-hops parameter. Additionally, the next-hop to the neighbor must be configured so that the two systems can establish a BGP session.

Enable multihop for an eBGP neighbor

The maximum-hops parameter is set to 2, which increases the TTL for BGP TCP/IP packets sent toward the eBGP neighbor, allowing the neighbor to be indirectly connected by up to 2 hops.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    multihop {
                        admin-state enable
                        maximum-hops 2
                    }
                }
        }
    }

Configure a route to the next-hop toward the eBGP neighbor

--{ * candidate shared default }--[  ]--
# info network-instance default static-routes
    network-instance default {
        static-routes {
            route 192.168.11.0/24 {
                next-hop-group static-ipv4-grp
            }
        }
    }
--{ * candidate shared default }--[  ]--
# info network-instance default next-hop-groups group static-ipv4-grp
    network-instance default {
        next-hop-groups {
            group static-ipv4-grp {
                nexthop 1 {
                    ip-address 192.168.22.22
                }
            }
        }
    }

AS path options

You can set the following options for handling the AS_PATH in received BGP routes:

  • Allow own AS – configures the router to process received routes when its own ASN appears in the AS_PATH.

  • Replace peer AS – configures the router to replace the ASN of the peer router in the AS_PATH with its own ASN.

  • Remove private AS path numbers – configures the router to either delete private AS numbers, shortening the AS path length, or replace private AS numbers with the local AS number used toward the peer, maintaining the AS path length.

Configuring allow-own-as

You can use the allow-own-as option to configure the router to process received routes when its own ASN appears in the AS_PATH. Normally, when the ASN of a router appears in the AS_PATH of received routes, it is considered a loop, and the routes are discarded. Specifically, it configures the maximum number of times the global ASN of the router can appear in any received AS_PATH before it is considered a loop and considered invalid. Default is 0.

The following example configures the router to process received routes where its own ASN appears in the AS_PATH a maximum of 1 time:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                autonomous-system 65001
                as-path-options {
                    allow-own-as 1
                    }
                }
        }
    }

Configuring replace-peer-as

You can configure the router to replace the peer ASN in AS_PATH with its own ASN. Normally, two sites having the same ASN would not be able to reach each other directly because the receiving router would see its own ASN in the AS_PATH and consider it a loop. You can overcome this by configuring the router to replace the peer ASN in the AS_PATH with its own ASN. When the replace-peer-as option is set to true, the router replaces every occurrence of the peer AS number that is present in the advertised AS_PATH with the local ASN used toward the peer.

The following example configures the router to replace the ASN of the peer with its own ASN:

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    replace-peer-as true
                    }
                }
            }
        }
    }

Configuring remove-private-as

You can configure how the router handles private AS numbers: either delete them, shortening the AS path length, or replace private AS numbers with the local AS number used toward the peer, which maintains the AS path length.

You can configure the router to delete or replace private AS numbers that appear before the first occurrence of a non-private ASN in the sequence of most recent ASNs in the AS path. You can also configure the router to ignore private AS numbers when they are the same as the peer ASN.

Configure the router to delete private AS numbers

The following example configures the router to delete private AS numbers (2-byte and 4-byte) from the advertised AS path toward all peers. This shortens the AS path.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode delete
                        }
                    }
                }
           }
        }
    }

The following example configures the router to replace private AS numbers with the local AS number used toward the peer. This keeps the AS path the same length.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                        }
                    }
                }
        }
    }

The following example configures the router to replace only private AS numbers that appear before the first occurrence of a non-private ASN in the sequence of most recent ASNs in the AS path.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                        leading-only true
                        }
                    }
                }
        }
    }

The following example configures the router to ignore private AS numbers (neither delete nor replace them) when they are the same as the peer AS number.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                as-path-options {
                    remove-private-as {
                        mode replace
                        ignore-peer-as true
                        }
                    }
                }
        }
    }

BGP MED

The Multi-Exit Discriminator (MED) attribute is an optional attribute that can be added to routes advertised to an eBGP peer to influence the flow of inbound traffic to the AS. The MED attribute carries a 32-bit metric value. A lower metric is better than a higher metric when MED is compared by the BGP decision process.

By default, the MED attribute is compared only if the routes come from the same neighbor AS. You can optionally configure SR Linux to compare the MED value from different ASes when selecting the best route.

Configuring always-compare-med

To configure SR Linux to use MED values from different ASes in the BGP decision process (tie-break between routes for the same NLRI), set the always-compare-med option to true. By default, this option is set to false, which uses MED values in the BGP decision process only for routes from the same neighbor AS.

The following example sets the always-compare-med option to true:

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp best-path-selection
    network-instance default {
        protocols {
            bgp {
                best-path-selection {
                    always-compare-med true
                }
            }
        }
    }

Route reflection

In a standard iBGP configuration, all BGP speakers within an AS must have full BGP mesh to ensure that all externally learned routes are redistributed through the entire AS.

Configuring route reflection provides an alternative to the full BGP mesh requirement: instead of peering with all other iBGP routers in the network, each iBGP router only peers with a router configured as a route reflector.

An AS can be divided into multiple clusters, with each cluster containing at least one route reflector, which redistributes routes to the clients in the cluster. The clients within the cluster do not need to maintain a full peering mesh between each other. They only require a peering to the route reflectors in their cluster. The route reflectors must maintain a full peering mesh between all non-clients within the AS.

Configuring route reflection

To configure a route reflector, you assign it a cluster ID and specify which neighbors are clients and which are non-clients. Clients receive reflected routes, and non-clients are treated as a standard iBGP peer.

The following example configures the router to be a route reflector for two clients SRL-1 and SRL-2. The router is assigned cluster ID 0.0.0.1.

--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                route-reflector {
                    cluster-id 0.0.0.1
                    }
                }
                neighbor SRL-1 {
                    route-reflector {
                        cluster-id 0.0.0.1
                        client true
                    }
                neighbor SRL-2 {
                    route-reflector {
                        cluster-id 0.0.0.1
                        client true
                    }
                }
            }
        }
    }

BGP graceful restart

BGP graceful restart allows a router whose control plane has temporarily stopped functioning because of a system failure or a software upgrade to return to service with minimal disruption to the network.

To do this, the router relies on neighbor routers, which have also been configured for graceful restart, to maintain forwarding state while the router restarts. These neighbor routers are known as helper routers. The helper routers and the restarting router continue forwarding traffic using the previously learned routing information from the restarting router. Other routers in the network are not notified about the restarting router, so network traffic is not disrupted.

When graceful restart is enabled on the SR Linux and its neighbor, the two routers exchange information about graceful restart capability, including the Address Family Identifier (AFI) and Subsequent Address Family Identifier (SAFI) of the routes supported for graceful restart.

While the router restarts, the helper router marks the routes from the restarting router as stale, but continues to use them for traffic forwarding. When the BGP session is reestablished, the restarting router indicates to the helper router that it has restarted. The helper router then sends the restarting router any BGP RIB updates, followed by an End-of-RIB (EOR) marker indicating that the updates are complete. The restarting router then makes its own updates and sends them to the helper router, followed by an EOR marker.

Graceful restart is used in conjunction with the In-Service Software Upgrade (ISSU) feature, which can be used to upgrade 7220 IXR-D2 and D3 systems while maintaining non-stop forwarding. During the ISSU, a warm reboot brings down the control and management planes while the NOS reboots, and graceful restart maintains the forwarding state in peers. You can use a tools command to validate that the SR Linux and its peers support warm reboot, including graceful restart configuration. See the SR Linux Software Installation Guide for more information.

Configuring graceful restart

You can configure graceful restart for the BGP instance. The SR Linux operates as a helper router for neighbor routers when they are restarting, assuming graceful restart is also enabled on the neighbors. Enabling graceful restart also indicates to the neighbors that they can serve as helper routers when the SR Linux itself is restarting.

When operating as a helper router, the SR Linux marks the routes from the restarting router as stale, but continues to use them for forwarding for a period of time while the neighbor router restarts. After this period expires, the SR Linux deletes the routes. The stale-routes-time parameter configures the amount of time in seconds the routes remain stale before they are deleted.

The requested-restart-time parameter configures the amount of time in seconds to wait for a graceful restart-capable neighbor to re-establish a TCP connection. After this period expires, the helper router deletes the stale routes it preserved on behalf of its neighbor routers.
--{ * candidate shared default }--[  ]--
# info network-instance default
    network-instance default {
        protocols {
            bgp {
                graceful-restart {
                    admin-state enable
                    stale-routes-time 300
                    requested-restart-time 300
                }
            }
        }
    }

Following a restart, by default the system waits 600 seconds (10 minutes) to receive EOR markers from all helper routers for all address families that were up before the restart. After this time elapses, the system assumes convergence has occurred and sends its own EOR markers to its peers. You can configure the amount of time the system waits to receive EOR markers to be from 0 to 3,600 seconds.

For example, the following configures the amount of time the system waits to receive EOR markers to 270 seconds.

--{ * candidate shared default }--[  ]--
# info system warm-reboot
    system {
        warm-reboot {
            bgp-max-wait 270
        }
    }

BGP unnumbered peering

In a typical large-scale data center using BGP, leaf and spine switches are interconnected in a Clos topology, and each device establishes a single-hop eBGP session with each of its physically connected peers. The sessions come up as eBGP because of the ASN allocation scheme; it is common practice to assign a unique ASN to every leaf switch (TOR) in a cluster and a different unique ASN to the set of spine switches to which those TORs are connected. The allocated ASNs are typically private ASNs in the range 4200000000 to 4294967294, although this is not always the case.

For this type of configuration, BGP unnumbered peering can be a useful solution. BGP unnumbered peering is the dynamic setup of one or more single-hop BGP sessions over a network segment that has no globally-unique IPv4 or IPv6 addresses. Each router connected to the network segment is assumed to have an IPv6-enabled interface to the network, and these interfaces have IPv6 link-local addresses that are typically auto-generated by each router from the interface MAC addresses.

How sessions are established using BGP unnumbered peering

The set of BGP speakers configured for BGP unnumbered peering on a network segment discover each other by sending and receiving ICMPv6 router advertisement (RA) messages.

Consider an example of Router A and Router B, which are both connected to an unnumbered interface and configured for BGP unnumbered dynamic session setup. The BGP session between the two routers is established in the following sequence:

  1. Router B sends an ICMPv6 RA message on its interface b1.

    Assuming the RA message is unsolicited, the source IP address of this message is the link-local address of interface b1 (fe80::7efe:90ff:fefc:7ad8), and the destination IP address is the all-nodes multicast address.

  2. Asynchronously, Router A sends an ICMPv6 RA message on its interface a1.

    The source IP address is the link-local address of interface a1 (fe80::7efe:90ff:fefc:7bd8), and the destination IP address is the all-nodes multicast address.

  3. Router A receives the RA message on interface a1, and the software process responsible for ICMPv6 relays the information to BGP, because in the BGP configuration, a1 is a subinterface that is configured as a dynamic neighbor interface; that is, added to the BGP dynamic-neighbors interface list.
  4. BGP checks if it already has a BGP session with fe80::7efe:90ff:fefc:7ad8.
    • If BGP already has this session and it is up, or BGP is in the process of establishing this session, then the new information is a no-op. Possibly Router B started the same process moments before Router A.

    • If BGP does not have a session with this link-local address, then a new TCP connection is initiated toward fe80::7efe:90ff:fefc:7ad8.

  5. When the TCP connection is established, the BGP OPEN message sent by Router A encodes a local-AS and other capabilities that come from the configuration of the peer-group associated with interface a1.
  6. Router A receives a BGP OPEN message from Router B and accepts that OPEN message, proceeding to move toward the BGP established state, if the OPEN message encodes an acceptable peer AS number (in one of the allowed-peer-as ranges configured for interface a1). The address families supported by the session are based on the usual MP-BGP negotiation.

BGP dynamic-neighbors interface list

To enable dynamic peering, you add subinterfaces to the BGP dynamic-neighbors interface list in the SR Linux configuration.

When a subinterface is added to the dynamic-neighbors interface list:
  • BGP automatically accepts incoming BGP connections to the IPv6 link-local address of that subinterface, subject to the configured max-sessions limit for the subinterface.

    For the connection to be accepted, the source address must be an IPv6 link-local address (that may or may not also be a defined neighbor address), and the reported ASN of the peer must match relevant configuration. If the source address does not match a configured neighbor address, the session is set up according to the peer-group associated with the subinterface, not the peer-group associated with the dynamic-neighbors accept match-prefix entry matching the source IPv6 link-local address if a matching entry exists.

  • BGP registers for IPv6 RA messages on the subinterface. Whenever the source of one of these RA messages matches an IPv6 link-local address for which there is currently no established BGP session, the system attempts to create a BGP session to that address, as long as this does not exceed the configured max-sessions limit for the subinterface. The session is set up according to the configured peer-group associated with the subinterface.
When a BGP session is established over a subinterface in the in the dynamic-neighbors interface list:
  • Changes to the allowed-peer-as ranges associated with the subinterface are a no-op until the next time BGP attempts to establish the sessions.
  • Non-arrival of expected ICMPv6 RA messages on the subinterface are a no-op, and do not trigger teardown of associated sessions.
  • Existing triggers for tearing down a session apply as normal (for example, hold-timer expiration, BFD timeout, clear bgp neighbor commands, and so on).
  • If the link-local address of a dynamic peer is configured as a static neighbor address, the dynamic session is immediately torn down and replaced by the static session.

When a subinterface is deleted from the dynamic-neighbors interface list, all dynamic sessions associated with that subinterface (excluding sessions set up by static configuration of the neighbor) are torn down immediately.

A BGP session that was previously established on an unnumbered interface and subsequently torn down can only be re-established if the subinterface is configured in the dynamic-neighbors interface list and a recent ICMPv6 RA message is received.

Configuration overrides for dynamic peers on unnumbered interfaces

When a dynamic BGP session is initiated or accepted on an interface that is tied to a peer-group, most of the parameters relevant to that session come from the configuration of that peer-group, with the following exceptions:

  • multihop maximum-hops is always 1 (for both eBGP and iBGP peers).
  • transport local-address is always the link-local address of the specified interface.
  • next-hop-self is always true. The neighbor is not presumed to have reachability to off-link destinations.
  • transport passive-mode is always false. BGP always initiates a connection when informed by ICMPv6, unless it already has a connection.
  • ipv4-unicast ipv4-unicast receive-ipv6-next-hops is always enabled.
  • ipv4-unicast ipv4-unicast advertise-ipv6-next-hops and evpn advertise-ipv6-next-hops are always enabled.

Peer AS Validation for dynamic peers on unnumbered interfaces

When a BGP OPEN message is received from an unnumbered dynamic neighbor, the reported AS number of the peer is checked to determine if it is acceptable to allow the peering to proceed.

For a dynamic session associated with a subinterface, the peer AS is acceptable only if it matches one of the allowed-peer-as elements of the dynamic-neighbors interface list entry for the subinterface, or if the peer AS is equal to the local AS (implying an iBGP session).

Configuring BGP unnumbered peering

To configure BGP unnumbered peering, you add subinterfaces to the BGP dynamic-neighbors interface list, and specify the peer autonomous system numbers from which incoming TCP connections to the BGP well-known port are accepted.

The following example adds a subinterface to the BGP dynamic-neighbors interface list.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp dynamic-neighbors interface ethernet-1/1.1
    network-instance default {
        protocols {
            bgp {
                dynamic-neighbors {
                    interface ethernet-1/1.1 {
                        peer-group bgp_peer_group_0
                        allowed-peer-as [
                            4294967200
                        ]
                    }
                }
            }
        }
    }

In this example, subinterface ethernet-1/1.1 is added to the BGP dynamic-neighbors interface list. This subinterface must be enabled for IPv6 and configured to accept and send IPv6 RA messages. It does not require any IPv4 addresses or global-unicast IPv6 addresses.

Incoming TCP connections to port 179 received on this subinterface that are sourced from an IPv6 link-local address and destined for the IPv6 link local address of this subinterface are automatically accepted. IPv6 RA messages received on this subinterface automatically trigger BGP session setup toward the sender of these messages, if there is not already an established BGP session.

Peer group bgp_peer_group_0 is associated with dynamic BGP neighbors on this subinterface. Parameters configured for this peer-group are used for establishing the dynamic BGP session, with the exceptions described in Configuration overrides for dynamic peers on unnumbered interfaces.

ASN 4294967200 is configured as an allowed peer AS for dynamic BGP neighbors on this subinterface. If the BGP OPEN message from a peer on this subinterface contains a MyAS number that is not an allowed peer AS, then a NOTIFICATION is sent to the peer with the indication Bad Peer AS.

Prefix-limit for BGP peers

SR Linux places a limit on the number of IPv4, IPv6, or EVPN route prefixes that can be received from a peer or from individual members of a peer group. When this prefix-limit is exceeded, SR Linux tears down the BGP session with the peer, then re-establishes the session.

You can configure the following settings for the prefix-limit:

  • max-received-routes

    By default, the prefix-limit that triggers a BGP session teardown is 4294967295 routes, which is the maximum number of routes that can be received from the peer (counting routes accepted and rejected by import policy). You can configure a different prefix-limit by setting a value for max-received-routes.

  • prefix-limit-restart-timer

    By default, after a BGP session is torn down because the prefix limit was exceeded, the BGP session is re-established immediately. You can configure the number of seconds the system waits before re-establishing the session by setting a value for prefix-limit-restart-timer.

  • prevent-teardown

    You can prevent the BGP session from being torn down when the prefix-limit is exceeded by setting prevent-teardown to true.

  • warning-threshold-pct

    You can set a warning threshold for the prefix-limit. When the number of routes received from the peer (counting routes accepted and rejected by import policy) reaches a specified percentage of the max-received-routes setting, BGP raises a warning log event. The default threshold is 90%.

When upgrading from a release earlier than 23.3.1 to Release 23.3.1 or later, the upgrade script checks the configured max-received-routes setting for IPv4 and IPv6 routes. If the configured max-received-routes setting is equal to 4295967295 for IPv4 or IPv6 routes, then prevent-teardown for IPv4 or IPv6 routes is set to true.

Configuring the prefix-limit for BGP peers

To configure the prefix-limit, you can set the maximum number of routes from a peer, number of seconds the system waits to re-establish a session following a teardown, and disable the prefix-limit for a peer.

The commands to set maximum number of routes from a peer and disable the prefix-limit can be applied to IPv4 and IPv6 routes. The settings can be applied to a specific peer or to a peer group. If there is no setting for a specific peer, the setting for the group applies. If there is no setting for the peer and group, the system default applies.

Configure maximum number of routes from a peer

The following example configures the maximum number of IPv4 routes that can be received from a peer.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 afi-safi ipv4-unicast ipv4-unicast 
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    afi-safi ipv4-unicast {
                        ipv4-unicast {
                            prefix-limit {
                                max-received-routes 30000000
                            }
                        }
                    }
                }
            }
        }
    }

If max-received-routes is not configured for the peer, the max-received-routes setting for the peer group applies. If max-received-routes is not configured for the peer group, the system default maximum of 4294967295 routes applies.

Configure prefix-limit restart timer

The following example sets the number of seconds the system waits to re-establish a BGP session with a peer after the session was torn down because the max-received-routes value was exceeded.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 timers
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    timers {
                        prefix-limit-restart-timer 60
                    }
                }
            }
        }
    }

If the prefix-limit-restart-timer is not configured for the peer, the prefix-limit-restart-timer setting for the group applies. If the prefix-limit-restart-timer is not configured for the group, the BGP session with the peer is re-established immediately after teardown (that is, prefix-limit-restart-timer = 0 seconds).

Disable the prefix-limit

The following example disables the prefix-limit for IPv4 routes received from a peer, so that the BGP session is not torn down if the maximum number of IPv4 routes received from the peer is exceeded.

--{ candidate shared default }--[  ]--
# info network-instance default protocols bgp neighbor 192.168.11.1 afi-safi ipv4-unicast ipv4-unicast   
    network-instance default {
        protocols {
            bgp {
                neighbor 192.168.11.1 {
                    afi-safi ipv4-unicast {
                        ipv4-unicast {
                            prefix-limit {
                                prevent-teardown true
                            }
                        }
                    }
                }
            }
        }
    }

BGP configuration management

Managing the BGP configuration on SR Linux can include the following tasks:

  • Modifying an AS number
  • Deleting a BGP neighbor from the configuration
  • Deleting a BGP group
  • Resetting BGP peer connections

Modifying an ASN

You can modify the ASN on the router, but the new ASN does not take effect until the BGP instance is restarted, either by administratively disabling/enabling the BGP instance, or by rebooting the system with the new configuration.

--{ * candidate shared default }--[ network-instance default ]--
# protocols bgp autonomous-system 95002
# protocols bgp admin-state disable
# protocols bgp admin-state enable

All established BGP sessions are taken down when the BGP instance is disabled.

Deleting a neighbor

Use the delete command to delete a BGP neighbor from the configuration.

--{ * candidate shared default }--[ network-instance default ]--
# delete protocols bgp neighbor 192.168.11.1

Deleting a group

Use the delete command to delete the settings for a BGP peer group from the configuration.

--{ * candidate shared default }--[ network-instance default ]--
# delete protocols bgp group headquarters1

Resetting BGP peer connections

To refresh the connections between BGP neighbors, you can issue a hard or soft reset. A hard-reset tears down the TCP connections and returns to IDLE state. A soft-reset sends route-refresh messages to each peer. The hard or soft reset can be issued to a specific peer, to peers in a specific peer-group, or to peers with a specific ASN.

Issue a hard reset

The following command hard-resets the connections to the BGP neighbors in a peer group that have a specified ASN. The hard reset applies both to configured peers and dynamic peers.

# tools network-instance default protocols bgp group headquarters1 reset-peer peer-as 95002
/network-instance[name=default]/protocols/bgp/group[group-name=headquarters1]:
    Successfully executed the tools clear command.

Issue a soft reset

The following command soft-resets the connection to BGP neighbors that have a specified ASN. The soft reset applies both to configured peers and dynamic peers.

# tools network-instance default protocols bgp soft-clear peer-as 95002
/network-instance[name=default]/protocols/bgp:
    Successfully executed the tools clear command.

BGP shortcuts

With BGP shortcuts, SR Linux can include LDP LSPs or segment routing (SR-ISIS) tunnels in the BGP algorithm calculations. In this case, tunnels operate as logical interfaces directly connected to remote nodes in the network. Because the BGP algorithm treats the tunnels in the same way as a physical interface (being a potential output interface), the algorithm can select a destination node together with an output tunnel to resolve the next-hop, using the tunnel as a shortcut through the network to the destination.

Note: BGP shortcuts can only be used for next-hop resolution of IPv4-unicast RIB-Ins with an IPv4 next-hop address.

BGP next-hop resolution describes the procedures that BGP uses to resolve the next-hop address of each BGP RIB-In that forms part of a BGP route. The following table defines BGP RIB-In and BGP route in the context of BGP next-hop resolution.

Table 1. BGP RIB-IN and BGP route
BGP Term Definition
BGP RIB-In One of the following:
  • a received IPv4-unicast BGP route with an IPv4 next-hop address
  • a received IPv4-unicast BGP route with an IPv6 next-hop address (allowed as a result of sending an extended-nh-encoding capability to the peer)
  • a received IPv6-unicast BGP route with an IPv6 next-hop address
BGP route A route submitted by BGP to the fib_mgr that resulted from the grouping of one or more BGP RIB-Ins. (Multiple BGP RIB-Ins per route describes a multipath scenario.)

With BGP shortcuts enabled, next-hop resolution determines whether to use a local interface or a tunnel to resolve the BGP next-hop.

Tunnel resolution mode

As part of the configuration for BGP shortcuts, you must define the tunnel-resolution mode (prefer/required/disabled). This mode determines the order of preference and fallback of using tunnels in the tunnel table to resolve the next-hop instead of using routes in the FIB, as described in the following sections.

Next-Hop Resolution of IPv4-Unicast RIB-Ins with IPv4 next-hop

The following table describes the next-hop resolution steps for IPv4-Unicast RIB-Ins with IPv4 next-hops, depending on the specified tunnel resolution mode.

Table 2. Next-hop resolution for IPv4 Unicast RIB-Ins with IPv4 next-hop address
Tunnel Resolution Mode Next-hop resolution steps in BGP
prefer
  1. Start with TTM lookup:
    1. Find all the tunnels in TTM with an endpoint that matches the BGP next-hop address and that have one of the types listed in the allow list.
    2. If there is a single tunnel, select that tunnel. The RIB-IN is resolved; exit.
    3. If there are multiple tunnels, select the tunnel with the numerically lowest TTM preference, and if a further tie-break is needed, select the tunnel with the lowest TTM metric. The RIB-IN is resolved; exit.
  2. If there are no tunnels, fallback to FIB lookup:
    1. Find the longest match active route in the FIB that matches the BGP next-hop address. There are presently no restrictions on this route; it can be an IGP route, a static blackhole route, a default route, or another BGP route.
    2. If there is a longest match route and it eventually resolves to a blackhole next-hop, interface or tunnel then the RIB-IN is resolved; exit.
    3. If there is no matching route the RIB-IN is unresolved.
require Perform TTM lookup only, as described in 1 above.

If there is no matching tunnel, the RIB-IN is unresolved.

disabled Perform FIB lookup only, as described in 2 above.

Next-Hop Resolution of IPv4-Unicast and IPv6-Unicast RIB-Ins with IPv6 next-hop

If the next-hop address for the IPv4-Unicast RIB-In is an IPv6 address, the next-hop is resolved by the longest prefix match IPv6 route in the FIB. This is the only option because there are no IPv6 tunnels in the TTM. The same logic applies to BGP RIB-Ins with IPv6-unicast NLRI address family as they can only have an IPv6 next-hop address. The next-hop resolution logic is the same as the FIB lookup described in the preceding table.

Configuring BGP shortcuts over segment routing

This task describes how to configure BGP shortcuts.

  1. In the default network instance, define the tunnel-resolution mode for the BGP protocol.
    This setting determines the order of preference and the fallback when using tunnels in the tunnel table instead of routes in the FIB. Available options are as follows:
    • require

      requires tunnel table lookup instead of FIB lookup

    • prefer

      prefers tunnel table lookup over FIB lookup

    • disabled (default)

      performs FIB lookup only

  2. Set the allowed tunnel types for next-hop resolution.

Configure IPv4 BGP shortcuts

The following example shows the BGP next-hop resolution configuration to allow IPv4 SR-ISIS tunnels, with the tunnel mode set to prefer.

--{ * candidate shared default }--[ ]--
# info network-instance default protocols bgp afi-safi ipv4-unicast ipv4-unicast next-hop-resolution ipv4-next-hops tunnel-resolution
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv4-unicast {
                    ipv4-unicast {
                        next-hop-resolution {
                            ipv4-next-hops {
                                tunnel-resolution {
                                    mode prefer
                                    allowed-tunnel-types [
                                        sr-isis
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    }

Configure IPv6 BGP shortcuts

The following example shows the BGP next-hop resolution configuration to allow IPv6 SR-ISIS tunnels, with the tunnel mode set to prefer.

--{ * candidate shared default }--[ ]--
# info network-instance default protocols bgp afi-safi ipv6-unicast ipv6-unicast next-hop-resolution ipv6-next-hops tunnel-resolution
    network-instance default {
        protocols {
            bgp {
                afi-safi ipv6-unicast {
                    ipv6-unicast {
                        next-hop-resolution {
                            ipv6-next-hops {
                                tunnel-resolution {
                                    mode prefer
                                    allowed-tunnel-types [
                                        sr-isis
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    }

BGP TCP MSS

BGP uses TCP transport, and BGP messages are carried as TCP segments. SR Linux allows you to control the Maximum Segment Size (MSS) for each TCP segment based on the Path MTU discovery settings.

Path MTU discovery can be enabled or disabled per network instances in SR Linux. The default is enabled.

Within the BGP hierarchy, path MTU discovery can be enabled and disabled at different configuration levels. The supported configuration paths are:
  • network-instance.protocols.bgp.transport.mtu-discover
  • network-instance.protocols.bgp.group.transport.mtu-discover
  • network-instance.protocols.bgp.transport.neighbor.mtu-discover

BGP path MTU discovery by default inherits the value from the network instance for all BGP sessions. It can be overruled by the above config. When an ICMP fragmentation-needed message is received and BGP path MTU discovery is disabled, the system reduces the MTU for the BGP session according to the ICMP message, subject to the lower bound configured under the system-level min-path-mtu.

--{ * candidate shared default }--[ ]--
# info network-instance default 
    network-instance default {
        mtu {
            path-mtu-discovery true
        }
    } 
--{ * candidate shared default }--[ ]--
# info system mtu 
    system {
        mtu {
            min-path-mtu 552
        }
    }

Configuring BGP TCP MSS

The maximum size of each TCP segment is controlled by configuring the TCP MSS (tcp-mss) value.

SR Linux supports configuring TCP MSS at BGP instance, group, and neighbor configuration levels. The supported range for the tcp-mss value is 536-9446 bytes, and the default value is 1024 bytes.

The value of tcp-mss gets inherited down the configuration levels within the BGP hierarchy. If no tcp-mss is configured for a BGP neighbor, the tcp-mss value is taken from the BGP group, if it is configured there, or else is taken from the BGP instance. The default BGP instance tcp-mss value is used if neither the BGP group or the neighbor has a configured tcp-mss.

If the configured or inherited tcp-mss value is higher than the BGP path MTU value, the tcp-mss value is ignored, and the BGP path MTU value is used as the operational TCP MSS.

Configuring BGP session tcp-mss

The following example configures the BGP instance tcp-mss value.

info from state network-instance default protocols bgp trans
port tcp-mss
    network-instance default {
        protocols {
            bgp {
                transport {
                    tcp-mss 1024
                }
            }
        }
    }

Configuring BGP group tcp-mss

The following example configures the BGP group tcp-mss.

info from state network-instance default protocols bgp group trans
port tcp-mss
    network-instance default {
        protocols {
            bgp {
                group test {
                    transport {
                        tcp-mss 1024
                    }
                }
            }
        }
    }

Configuring BGP neighbor tcp-mss

The following example configures the BGP neighbor tcp-mss .

info from state network-instance default protocols bgp  neighbor 1.1.1.1 transport tcp-mss

     network-instance default {
        protocols {
            bgp {
                neighbor 192.168.0.1 {
                    transport {
                        tcp-mss 1012
                    }
                }
            }
        }
    }

If the configured or inherited tcp-mss value is higher than the operaitonal path MTU value, the tcp-mss value is ignored and the path MTU value is used as the operational TCP MSS.

Error handling for BGP update messages

BGP update messages are used to transfer routing information between BGP peers. Errors in some BGP update messages are considered critical; for example, if the network layer reachability information (NLRI) cannot be extracted and parsed from an update message, it is a critical error. Errors in other BGP update messages are considered non-critical; for example, errors such as incorrect attribute flag settings, missing mandatory path attributes, incorrect next-hop length or format, and so on, are non-critical errors.

In SR Linux, critical errors in BGP update messages trigger a session reset. Non-critical errors are handled using the treat-as-withdraw or attribute-discard approaches to error handling. This error-handling behavior for BGP update messages is not configurable in SR Linux.