BGP Optimal Route Reflection for Non-Hierarchical Networks

This chapter provides information about BGP optimal route reflection for non-hierarchical networks.

Topics in this chapter include:

Applicability

This chapter was initially written based on SR OS Release 15.0.R4, but the CLI in the current edition corresponds to SR OS Release 23.7.R2.

Overview

BGP route reflectors are used in many networks. They improve network scalability by eliminating or reducing the need for a full-mesh of IBGP sessions.

When a BGP route reflector receives multiple paths for the same IP destination, it normally selects and reflects a single best path in its routing domain to all clients in that domain, based on its own location in the domain. In Centralized route reflection, the centralized route reflector RR for ISP-1 is located in the datacenter (DC), and receives prefix X from ISP-2 through PE-2 in point of presence PoP-1 and also through PE-3 in PoP-2. RR selects and reflects PE-2 as the best path to the remaining route reflector clients because RR is closer to PoP-1 than it is to PoP-2, so the traffic to destination X flows as indicated. Therefore, sending traffic to another autonomous system (AS) through the closest possible exit point from the local AS, known as hot-potato routing, cannot be achieved.

Figure 1. Centralized route reflection

Hot-potato routing can be achieved using a route reflector selecting and reflecting multiple best paths, for different subdomains and from the point of view of a client in a subdomain, as outlined in RFC 9107 BGP optimal route reflection (ORR), and requires the route reflector to know the topology of each subdomain. In Centralized route reflection with ORR, the route reflector calculates the best path for PoP-1 and reflects that to the clients in PoP-1 (PE-1), and it also calculates the best path for PoP-2 and reflects that to the clients in PoP-2 (PE-4).

Figure 2. Centralized route reflection with ORR

If the routing domain is non-hierarchical, the route reflector is part of the routing domain and thus has a view on the entire topology through the interior gateway protocol (IGP).

If the routing domain is hierarchical, the route reflector needs to extract the link state database (LSDB) from the subdomain it is not part of, which is achieved through BGP link state (BGP-LS). The use of BGP-LS allows the route reflector to learn the IGP topology information for OSPF areas and IS-IS levels in which the route reflector is not a direct participant. See the BGP Optimal Route Reflection for Hierarchical Networks chapter if the network topology is hierarchical.

ORR CLI commands

The BGP optimal-route-reflection context defines the shortest path first (SPF) parameters, and multiple locations.

*A:RR-5>config>router>bgp# optimal-route-reflection ?
  - optimal-route-reflection

      location        + Configure location ID for route reflector
 [no] spf-wait        - Configure the spf-wait parameters

The SPF calculation is configurable with the spf-wait command. Initial-wait and second-wait are optional arguments. These timers define when to initiate the first, second, and subsequent SPF runs after a topology change occurs.

*A:RR-5>config>router>bgp>orr# spf-wait ?
  - spf-wait <max-wait> [initial-wait <initial-wait>] [second-wait <second-wait>]

 <max-wait>           : [1..600] in seconds
 <initial-wait>       : [1..300] in seconds
 <second-wait>        : [1..300] in seconds

Multiple locations can be created in the optimal-route-reflection context, as follows. Each location is identified through a location ID [1..255], and contains a primary IP address and, optionally, a secondary IP address and a tertiary IP address, for redundancy reasons. These addresses must correspond to loopback or system IP addresses of routers participating in the IGP protocols, and are used as the starting point (or seed) for the SPF calculation. Because all clients in the same location receive the same optimal path for that location, these addresses must be close to the clients in that part of the network.

*A:RR-5>config>router>bgp>orr# location ?
  - location <location-id> [primary-ip-address <ipv4-address>] [secondary-ip-address <ipv4-address>]
    [tertiary-ip-address <ipv4-address>]

 <location-id>        : 1..255


 [no] primary-ip-add* - Configure Primary IP address for location ID
 [no] primary-ipv6-a* - Configure Primary IPv6 address for location ID
 [no] secondary-ip-a* - Configure Secondary IP address for location ID
 [no] secondary-ipv6* - Configure Secondary IPv6 address for location ID
 [no] tertiary-ip-ad* - Configure Tertiary IP address for location ID
 [no] tertiary-ipv6-* - Configure Tertiary IPv6 address for location ID

The locations are then referred to with the cluster command (residing in the BGP group context) through the orr-location argument, as follows.

*A:RR-5>config>router>bgp>group# cluster ?
  - cluster <cluster-id> orr-location <orr-location> [allow-local-fallback]
  - cluster <cluster-id>
  - no cluster

 <cluster-id>         : expressed in dotted decimal format (a.b.c.d)
 <orr-location>       : [1..255]
 <allow-local-fallb*> : configure to allow fallback on default orr location


*A:RR-5>config>router>bgp>group# neighbor 192.0.2.3 cluster ?
  - cluster <cluster-id> orr-location <orr-location> [allow-local-fallback]
  - cluster <cluster-id>
  - no cluster

 <cluster-id>         : expressed in dotted decimal format (a.b.c.d)
 <orr-location>       : [1..255]
 <allow-local-fallb*> : configure to allow fallback on default orr location

The location ID is referred to in the orr-location argument of the cluster command. Typically, the cluster command applies to a BGP peer group; all neighbors in that group share the same location ID, unless the cluster command applies at a neighbor level. The allow-local-fallback option allows the RR to advertise the best reachable BGP path using its own location, but only when no BGP routes are reachable for some location. Otherwise, no path would be advertised to the clients in that location.

Properties

The following properties apply to ORR in SR OS:

  • ORR is supported in the Base router BGP instance.

  • ORR is supported for the IPv4, label-IPv4, label-IPv6, VPN-IPv4, and VPN-IPv6 address families.

  • ORR is supported with add-paths, meaning that add-paths advertised to ORR clients are also ORR location-based.

Configuration

Example non-hierarchical networking using IS-IS shows the example topology. IS-IS is used as the IGP for AS 65536, with RR-5 taking the role of the route reflector for clients PE-1 to PE-4. Additionally, ASBR-6 in AS 65537 peers with PE-1, and ASBR-7 in AS 65538 peers with PE-4.

Figure 3. Example non-hierarchical networking using IS-IS

The initial configuration on all nodes includes:

  • Cards, MDAs, and ports

  • Router interfaces

  • IS-IS as IGP on all interfaces within AS 65536, in a non-hierarchical way (alternatively, OSPF can be used), and traffic engineering enabled

The basic IS-IS configuration is very similar for all routers, including the route reflector. The RR-5 configuration is as follows:

# on RR-5:
configure
    router Base
        isis 0
            area-id 49.0001
            traffic-engineering
            interface "system"
                no shutdown
            exit
            interface "int-RR-5-PE-2"
                interface-type point-to-point
                no shutdown
            exit
            interface "int-RR-5-PE-3"
                interface-type point-to-point
                no shutdown
            exit
            no shutdown
        exit

Route reflection without ORR

RR-5 peers with clients PE-1 to PE-4, and because RR-5 is the route reflector, the cluster command is added, defining the cluster ID attribute value to use. The configuration for RR-5 is as follows:

# on RR-5:
configure
    router Base
        autonomous-system 65536
        bgp
            loop-detect discard-route
            split-horizon
            group "IBGP"
                cluster 192.0.2.5
                peer-as 65536            # type internal
                neighbor 192.0.2.1
                exit
                neighbor 192.0.2.2
                exit
                neighbor 192.0.2.3
                exit
                neighbor 192.0.2.4
                exit
            exit
            no shutdown
        exit

PE-1 belongs to the cluster defined in the route reflector, so it does not need to be fully meshed with the other routers in the area; peering with the route reflectors in the area is sufficient for PE-1 to receive updates. Typically, two route reflectors are provisioned for redundancy, but that does not apply in this example. PE-1 also peers with ASBR-6 in AS 65537 through EBGP, so the PE-1 configuration is as follows:

# on PE-1:
configure
    router Base
        autonomous-system 65536
        bgp
            loop-detect discard-route
            split-horizon
            group "EBGP"
                neighbor 172.16.16.2
                    peer-as 65537
                exit
            exit
            group "IBGP"
                next-hop-self
                peer-as 65536
                neighbor 192.0.2.5
                exit
            exit
            no shutdown
        exit

PE-2 and PE-3 only peer with the route reflector. Their configuration is the same:

# on PE-2, PE-3:
configure
    router Base
        autonomous-system 65536
        bgp
            loop-detect discard-route
            split-horizon
            group "IBGP"
                peer-as 65536
                neighbor 192.0.2.5
                exit
            exit
            no shutdown
        exit

PE-4 also belongs to the cluster defined in the route reflector, but peers with ASBR-7 in AS 65538. The PE-4 configuration is similar to the configuration of PE-1.

Loopback address 10.1.11.1/24 is configured on ASBR-8 in AS 65540 (not shown in the example topology). ASBR-8 exports prefix 10.1.11.0/24 to its EBGP peers ASBR-6 in AS 65537 and ASBR-7 in AS 65538. ASBR-6 advertises prefix 10.1.11.0/24 to router PE-1; ASBR-7 advertises the same prefix to router PE-4.

RR-5 receives IBGP updates from PE-1 and PE-4, and selects the best path based on its own position in the topology. The IGP cost from RR-5 to PE-1 is 20, and the cost from RR-5 to PE-4 is 25, so RR-5 selects the BGP path with next hop 192.0.2.1.

*A:RR-5# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.5        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       100         None
      192.0.2.1                                          None        20
      65537 65540                                                    -
*i    10.1.11.0/24                                       100         None
      192.0.2.4                                          None        25
      65538 65540                                                    -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

RR-5 reflects the path with next hop 192.0.2.1 to all clients except PE-1, because PE-1 is the client where the path was learned from).

For prefix 10.1.11.0/24, PE-1 received an EBGP route from ASBR-6 in AS 65537 with next hop 172.16.16.2 and no IBGP route from RR-5:

*A:PE-1# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.1        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       None        None
      172.16.16.2                                        None        0
      65537 65540                                                    -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

As a result, traffic offered to PE-1 for destination 10.1.11.0/24 is routed to ASBR-6, as follows:

*A:PE-1# show router route-table protocol bgp

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.1.11.0/24                                  Remote  BGP       00h01m33s  170
       172.16.16.2                                                  0
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

PE-2 received an IBGP route for prefix 10.1.11.0/24 with next hop 192.0.2.1 from RR-5:

*A:PE-2# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.2        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       100         None
      192.0.2.1                                          None        10
      65537 65540                                                    -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

Traffic offered to PE-2 for destination 10.1.11.0/24 is routed to PE-1, as follows:

*A:PE-2# show router route-table protocol bgp

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.1.11.0/24                                  Remote  BGP       00h00m40s  170
       192.168.12.1                                                 10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

Likewise, PE-3 received an IBGP route for prefix 10.1.11.0/24 with next hop 192.0.2.1 from RR-5:

*A:PE-3# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.3        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       100         None
      192.0.2.1                                          None        20
      65537 65540                                                    -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

Traffic offered to PE-3 for destination 10.1.11.0/24 is routed via the interface address 192.168.23.1 on PE-2, as follows:

*A:PE-3# show router route-table protocol bgp

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.1.11.0/24                                  Remote  BGP       00h01m05s  170
       192.168.23.1                                                 20
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

For prefix 10.1.11.0/24, PE-4 received an EBGP route from ASBR-7 with next hop 172.16.47.2 and an IBGP route from RR-5 with next hop 192.0.2.1, as follows. EBGP routes are preferred over IBGP routes.

*A:PE-4# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.4        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       None        None
      172.16.47.2                                        None        0
      65538 65540                                                    -
*i    10.1.11.0/24                                       100         None
      192.0.2.1                                          None        35
      65537 65540                                                    -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================

The used route is the EBGP route from ASBR-7, so the traffic offered to PE-4 for destination 10.1.11.0/24 is routed to ASBR-7, as follows:

*A:PE-4# show router route-table protocol bgp

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.1.11.0/24                                  Remote  BGP       00h01m54s  170
       172.16.47.2                                                  0
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

This is summarized in Suboptimal route reflection. Ultimately, PE-1 only has one path, and so do PE-2 and PE-3. PE-4 has two paths, but by default prefers the EBGP learned path over the IBGP learned path. The routing is suboptimal on PE-3, where the IGP cost to PE-1 is 20 and the IGP cost to PE-4 is 15.

Figure 4. Suboptimal route reflection

Route reflection with ORR

For implementing ORR using the non-hierarchical topology from Suboptimal route reflection the route reflector RR-5 defines two locations in the optimal-route-reflection context. The primary IP address for location 1 is the PE-1 system IP address 192.0.2.1; the primary IP address for location 2 is loopback address 192.0.2.44 on PE-4 and the secondary IP address is loopback address 192.0.2.33 on PE-3. These addresses are used as the starting point for the SPF run. The ORR locations 1 and 2 are then referred to from within the group definitions through the cluster command. The overall BGP configuration of RR-5 is as follows:

# on RR-5
configure
    router Base
        autonomous-system 65536
        bgp
            loop-detect discard-route
            split-horizon
            optimal-route-reflection
                spf-wait 1 initial-wait 1 second-wait 1
                location 1
                    primary-ip-address 192.0.2.1
                exit
                location 2
                    primary-ip-address 192.0.2.44      # loopback address on PE-4
                    secondary-ip-address 192.0.2.33    # loopback address on PE-3
                exit
            exit
            group "IBGP-1"
                cluster 192.0.2.5 orr-location 1 allow-local-fallback
                peer-as 65536
                neighbor 192.0.2.1
                exit
                neighbor 192.0.2.2
                exit
            exit
            group "IBGP-2"
                cluster 192.0.2.5 orr-location 2 allow-local-fallback
                peer-as 65536
                neighbor 192.0.2.3
                exit
                neighbor 192.0.2.4
                exit
            exit
            no shutdown
        exit

No changes are required in the BGP clients.

ASBR-6 advertises prefix 10.1.11.0/24 to router PE-1; ASBR-7 advertises the same prefix to router PE-4. RR-5 receives the updates from PE-1 and PE-4, and now performs two SPF runs because two locations are used. The first SPF run uses the 192.0.2.1 address of PE-1 as the starting point for the first location, selects the path via PE-1 as the best path, and reflects that path to the remaining peers in the first location. The second SPF run uses the 192.0.2.44 loopback address of PE-4 as the starting point for the second location, selects the path via PE-4 as the best path, and reflects that path to the remaining peers in the second location.

In comparison with the previous scenario, there only is a change in the routing for this prefix on PE-3. RR-5 reflects the route with next hop 192.0.2.4 to PE-3.

*A:PE-3# show router bgp routes
===============================================================================
 BGP Router ID:192.0.2.3        AS:65536       Local AS:65536
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     IGP Cost
      As-Path                                                        Label
-------------------------------------------------------------------------------
u*>i  10.1.11.0/24                                       100         None
      192.0.2.4                                          None        15
      65538 65540                                                    -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================

Traffic offered to PE-3 for destination 10.1.11.0/24 has next hop PE-4 and is routed via the interface address 192.168.34.2 on PE-4, as follows:

*A:PE-3# show router route-table protocol bgp

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.1.11.0/24                                  Remote  BGP       00h03m12s  170
       192.168.34.2                                                 15
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

This is summarized in Optimal route reflection.

Figure 5. Optimal route reflection

The following command provides the IGP distances for the configured reference points to all available BGP peers and all detected BGP next hops on the route reflector.

*A:RR-5# show router bgp optimal-route-reflection bgp-nh-info

===============================================================================
ORR BGP-NH Table (Router: Base)
===============================================================================
Location 1:
    Primary        : 192.0.2.1 [active]
    Secondary      : -
    Tertiary       : -
    Primary-ipv6   : -
    Secondary-ipv6 : -
    Tertiary-ipv6  : -
Location 2:
    Primary        : 192.0.2.44 [active]
    Secondary      : 192.0.2.33
    Tertiary       : -
    Primary-ipv6   : -
    Secondary-ipv6 : -
    Tertiary-ipv6  : -

Age          : 00h04m02s
Spf wait     : 1
Initial wait : 1
Second wait  : 1

-------------------------------------------------------------------------------
Next Hop
   Loc    Dest-Prefix
                               DB-Source  Type         Proto     Metric    Pref
-------------------------------------------------------------------------------

192.0.2.1
    1     192.0.2.1/32
                               IGP        Local        Local     0            0
    2     192.0.2.1/32
                               IGP        Remote       ISIS      35          15

192.0.2.4
    1     192.0.2.4/32
                               IGP        Remote       ISIS      35          15
    2     192.0.2.4/32
                               IGP        Local        Local     0            0
-------------------------------------------------------------------------------
No. of BGP-NHs: 2
===============================================================================

Conclusion

BGP optimal route reflection allows operators to optimize traffic streams through their network, even when the route reflector is placed out-of-path, for example in datacenters, thereby reducing the OPEX and CAPEX of route reflector deployment.