BGP Optimal Route Reflection for Non-Hierarchical Networks
This chapter provides information about BGP optimal route reflection for non-hierarchical networks.
Topics in this chapter include:
Applicability
This chapter was initially written based on SR OS Release 15.0.R4, but the MD-CLI in the current edition corresponds to SR OS Release 23.7.R2.
Overview
BGP route reflectors are used in many networks. They improve network scalability by eliminating or reducing the need for a full-mesh of IBGP sessions.
When a BGP route reflector receives multiple paths for the same IP destination, it normally selects and reflects a single best path in its routing domain to all clients in that domain, based on its own location in the domain. In Centralized route reflection, the centralized route reflector RR for ISP-1 is located in the datacenter (DC), and receives prefix X from ISP-2 through PE-2 in point of presence PoP-1 and also through PE-3 in PoP-2. RR selects and reflects PE-2 as the best path to the remaining route reflector clients because RR is closer to PoP-1 than it is to PoP-2, so the traffic to destination X flows as indicated. Therefore, sending traffic to another autonomous system (AS) through the closest possible exit point from the local AS, known as hot-potato routing, cannot be achieved.
Hot-potato routing can be achieved using a route reflector selecting and reflecting multiple best paths, for different subdomains and from the point of view of a client in a subdomain, as outlined in RFC 9107 BGP optimal route reflection (ORR), and requires the route reflector to know the topology of each subdomain. In Centralized route reflection with ORR, the route reflector calculates the best path for PoP-1 and reflects that to the clients in PoP-1 (PE-1), and it also calculates the best path for PoP-2 and reflects that to the clients in PoP-2 (PE-4).
If the routing domain is non-hierarchical, the route reflector is part of the routing domain and thus has a view on the entire topology through the interior gateway protocol (IGP).
If the routing domain is hierarchical, the route reflector needs to extract the link state database (LSDB) from the subdomain it is not part of, which is achieved through BGP link state (BGP-LS). The use of BGP-LS allows the route reflector to learn the IGP topology information for OSPF areas and IS-IS levels in which the route reflector is not a direct participant. See the BGP Optimal Route Reflection for Hierarchical Networks chapter if the network topology is hierarchical.
ORR CLI commands
The BGP optimal-route-reflection context defines the shortest path first (SPF) parameters, and multiple locations.
*[ex:/configure router "Base" bgp]
A:admin@RR-5# optimal-route-reflection ?
optimal-route-reflection
location + Enter the location list instance
spf-wait + Enter the spf-wait context
The SPF calculation is configurable with the spf-wait command. Initial-wait and second-wait are optional arguments. These timers define when to initiate the first, second, and subsequent SPF runs after a topology change occurs.
*[ex:/configure router "Base" bgp optimal-route-reflection]
A:admin@RR-5# spf-wait ?
spf-wait
initial-wait - Initial SPF calculation delay after a topology change
max-wait - Maximum interval between consecutive SPF calculations
second-wait - Delay between first and second SPF calculation
Multiple locations can be created in the optimal-route-reflection context, as follows. Each location is identified through a location ID [1..255], and contains a primary IP address and, optionally, a secondary IP address and a tertiary IP address, for redundancy reasons. These addresses must correspond to loopback or system IP addresses of routers participating in the IGP protocols, and are used as the starting point (or seed) for the SPF calculation. Because all clients in the same location receive the same optimal path for that location, these addresses must be close to the clients in that part of the network.
*[ex:/configure router "Base" bgp optimal-route-reflection location 1]
A:admin@RR-5# ?
apply-groups - Apply a configuration group at this level
apply-groups-exclude - Exclude a configuration group at this level
primary-ip-address - Primary IPv4 address of the reference location for ORR
primary-ipv6-address - Primary IPv6 address of the reference location for ORR
secondary-ip-address - Secondary IPv4 address of reference location for ORR
secondary-ipv6- - Secondary IPv6 address of reference location for ORR
address
tertiary-ip-address - Tertiary IPv4 address of the reference location for ORR
tertiary-ipv6- - Tertiary IPv6 address of the reference location for ORR
address
The locations are then referred to with the cluster command (residing in the BGP group or neighbor context) through the orr-location argument, as follows.
*[ex:/configure router "Base" bgp group "IBGP-1"]
A:admin@RR-5# cluster ?
cluster
allow-local-fallback - Allow fallback to RR topology location
cluster-id - Route reflector cluster ID
orr-location - Optimal route reflection location for the cluster
*[ex:/configure router "Base" bgp neighbor "192.0.2.3"]
A:admin@RR-5# cluster ?
cluster
allow-local-fallback - Allow fallback to RR topology
cluster-id - Route reflector cluster ID
orr-location - Optimal route reflection location for the cluster
The location ID is referred to in the orr-location argument of the cluster command. Typically, the cluster command applies to a BGP peer group; all neighbors in that group share the same location ID, unless the cluster command applies at a neighbor level. The allow-local-fallback option allows the RR to advertise the best reachable BGP path using its own location, but only when no BGP routes are reachable for some location. Otherwise, no path would be advertised to the clients in that location.
Properties
The following properties apply to ORR in SR OS:
ORR is supported in the Base router BGP instance.
ORR is supported for the IPv4, label-IPv4, label-IPv6, VPN-IPv4, and VPN-IPv6 address families.
ORR is supported with add-paths, meaning that add-paths advertised to ORR clients are also ORR location-based.
Configuration
Example non-hierarchical networking using IS-IS shows the example topology. IS-IS is used as the IGP for AS 65536, with RR-5 taking the role of the route reflector for clients PE-1 to PE-4. Additionally, ASBR-6 in AS 65537 peers with PE-1, and ASBR-7 in AS 65538 peers with PE-4.
The initial configuration on all nodes includes:
Cards, MDAs, and ports
Router interfaces
IS-IS as IGP on all interfaces within AS 65536, in a non-hierarchical way (alternatively, OSPF can be used), and traffic engineering enabled
The basic IS-IS configuration is very similar for all routers, including the route reflector. The RR-5 configuration is as follows:
# on RR-5:
configure {
router "Base" {
isis 0 {
admin-state enable
traffic-engineering true
area-address [49.0001]
interface "int-RR-5-PE-2" {
interface-type point-to-point
}
interface "int-RR-5-PE-3" {
interface-type point-to-point
}
interface "system" {
}
}
Route reflection without ORR
RR-5 peers with clients PE-1 to PE-4, and because RR-5 is the route reflector, the cluster command is added, defining the cluster ID attribute value to use. The configuration for RR-5 is as follows:
# on RR-5:
configure {
router "Base" {
autonomous-system 65536
bgp {
loop-detect discard-route
split-horizon true
group "IBGP" {
peer-as 65536
cluster {
cluster-id 192.0.2.5
}
}
neighbor "192.0.2.1" {
group "IBGP"
}
neighbor "192.0.2.2" {
group "IBGP"
}
neighbor "192.0.2.3" {
group "IBGP"
}
neighbor "192.0.2.4" {
group "IBGP"
}
}
PE-1 belongs to the cluster defined in the route reflector, so it does not need to be fully meshed with the other routers in the area; peering with the route reflectors in the area is sufficient for PE-1 to receive updates. Typically, two route reflectors are provisioned for redundancy, but that does not apply in this example. PE-1 also peers with ASBR-6 in AS 65537 through EBGP, so the PE-1 configuration is as follows:
# on PE-1:
configure {
router "Base" {
autonomous-system 65536
bgp {
loop-detect discard-route
split-horizon true
group "EBGP" {
}
group "IBGP" {
next-hop-self true
peer-as 65536
}
neighbor "172.16.16.2" {
group "EBGP"
peer-as 65537
ebgp-default-reject-policy {
import false
}
}
neighbor "192.0.2.5" {
group "IBGP"
}
}
PE-2 and PE-3 only peer with the route reflector. Their configuration is the same:
# on PE-2, PE-3:
configure {
router "Base" {
autonomous-system 65536
bgp {
loop-detect discard-route
split-horizon true
group "IBGP" {
peer-as 65536
}
neighbor "192.0.2.5" {
group "IBGP"
}
}
PE-4 also belongs to the IBGP cluster defined in the route reflector and PE-4 peers with ASBR-7 in AS 65538. The PE-4 configuration is similar to the configuration of PE-1.
Loopback address 10.1.11.1/24 is configured on ASBR-8 in AS 65540 (not shown in the example topology). ASBR-8 exports prefix 10.1.11.0/24 to its EBGP peers ASBR-6 in AS 65537 and ASBR-7 in AS 65538. ASBR-6 advertises prefix 10.1.11.0/24 to router PE-1; ASBR-7 advertises the same prefix to router PE-4.
RR-5 receives IBGP updates from PE-1 and PE-4, and selects the best path based on its own position in the topology. The IGP cost from RR-5 to PE-1 is 20, and the cost from RR-5 to PE-4 is 25, so RR-5 selects the BGP path with next hop 192.0.2.1.
[/]
A:admin@RR-5# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.5 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 100 None
192.0.2.1 None 20
65537 65540 -
*i 10.1.11.0/24 100 None
192.0.2.4 None 25
65538 65540 -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================
RR-5 reflects the path with next hop 192.0.2.1 to all clients except PE-1, because PE-1 is the client where the path was learned from).
For prefix 10.1.11.0/24, PE-1 received an EBGP route from ASBR-6 in AS 65537 with next hop 172.16.16.2 and no IBGP route from RR-5:
[/]
A:admin@PE-1# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.1 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 None None
172.16.16.2 None 0
65537 65540 -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
As a result, traffic offered to PE-1 for destination 10.1.11.0/24 is routed to ASBR-6, as follows:
[/]
A:admin@PE-1# show router route-table protocol bgp
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
10.1.11.0/24 Remote BGP 00h04m15s 170
172.16.16.2 0
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
PE-2 received an IBGP route for prefix 10.1.11.0/24 with next hop 192.0.2.1 from RR-5:
[/]
A:admin@PE-2# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.2 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 100 None
192.0.2.1 None 10
65537 65540 -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
Traffic offered to PE-2 for destination 10.1.11.0/24 is routed to PE-1, as follows:
[/]
A:admin@PE-2# show router route-table protocol bgp
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
10.1.11.0/24 Remote BGP 00h17m22s 170
192.168.12.1 10
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
Likewise, PE-3 received an IBGP route for prefix 10.1.11.0/24 with next hop 192.0.2.1 from RR-5:
[/]
A:admin@PE-3# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.3 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 100 None
192.0.2.1 None 20
65537 65540 -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
Traffic offered to PE-3 for destination 10.1.11.0/24 is routed via the interface address 192.168.23.1 on PE-2, as follows:
[/]
A:admin@PE-3# show router route-table protocol bgp
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
10.1.11.0/24 Remote BGP 00h10m26s 170
192.168.23.1 20
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
For prefix 10.1.11.0/24, PE-4 received an EBGP route from ASBR-7 with next hop 172.16.47.2 and an IBGP route from RR-5 with next hop 192.0.2.1, as follows. EBGP routes are preferred over IBGP routes.
[/]
A:admin@PE-4# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.4 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 None None
172.16.47.2 None 0
65538 65540 -
*i 10.1.11.0/24 100 None
192.0.2.1 None 35
65537 65540 -
-------------------------------------------------------------------------------
Routes : 2
===============================================================================
The used route is the EBGP route from ASBR-7, so the traffic offered to PE-4 for destination 10.1.11.0/24 is routed to ASBR-7, as follows:
[/]
A:admin@PE-4# show router route-table protocol bgp
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
10.1.11.0/24 Remote BGP 00h18m08s 170
172.16.47.2 0
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
This is summarized in Suboptimal route reflection. Ultimately, PE-1 only has one path, and so do PE-2 and PE-3. PE-4 has two paths, but by default prefers the EBGP learned path over the IBGP learned path. The routing is suboptimal on PE-3, where the IGP cost to PE-1 is 20 and the IGP cost to PE-4 is 15.
Route reflection with ORR
For implementing ORR using the non-hierarchical topology from Suboptimal route reflection the route reflector RR-5 defines two locations in the optimal-route-reflection context. The primary IP address for location 1 is the PE-1 system IP address 192.0.2.1; the primary IP address for location 2 is loopback address 192.0.2.44 on PE-4 and the secondary IP address is loopback address 192.0.2.33 on PE-3. These addresses are used as the starting point for the SPF run. The ORR locations 1 and 2 are then referred to from within the group definitions through the cluster command. The overall BGP configuration of RR-5 is as follows:
# on RR-5
configure {
router "Base" {
autonomous-system 65536
bgp {
loop-detect discard-route
split-horizon true
optimal-route-reflection {
spf-wait {
max-wait 1
initial-wait 1
second-wait 1
}
location 1 {
primary-ip-address 192.0.2.1
}
location 2 {
primary-ip-address 192.0.2.44 # loopback address on PE-4
secondary-ip-address 192.0.2.33 # loopback address on PE-3
}
}
group "IBGP-1" {
peer-as 65536
cluster {
cluster-id 192.0.2.5
orr-location 1
allow-local-fallback true
}
}
group "IBGP-2" {
peer-as 65536
cluster {
cluster-id 192.0.2.5
orr-location 2
allow-local-fallback true
}
}
neighbor "192.0.2.1" {
group "IBGP-1"
}
neighbor "192.0.2.2" {
group "IBGP-1"
}
neighbor "192.0.2.3" {
group "IBGP-2"
}
neighbor "192.0.2.4" {
group "IBGP-2"
}
}
No changes are required in the BGP clients.
ASBR-6 advertises prefix 10.1.11.0/24 to router PE-1; ASBR-7 advertises the same prefix to router PE-4. RR-5 receives the updates from PE-1 and PE-4, and now performs two SPF runs because two locations are used. The first SPF run uses the 192.0.2.1 address of PE-1 as the starting point for the first location, selects the path via PE-1 as the best path, and reflects that path to the remaining peers in the first location. The second SPF run uses the 192.0.2.44 loopback address of PE-4 as the starting point for the second location, selects the path via PE-4 as the best path, and reflects that path to the remaining peers in the second location.
In comparison with the previous scenario, there only is a change in the routing for this prefix on PE-3. RR-5 reflects the route with next hop 192.0.2.4 to PE-3.
[/]
A:admin@PE-3# show router bgp routes
===============================================================================
BGP Router ID:192.0.2.3 AS:65536 Local AS:65536
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.1.11.0/24 100 None
192.0.2.4 None 15
65538 65540 -
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
Traffic offered to PE-3 for destination 10.1.11.0/24 has next hop PE-4 and is routed via the interface address 192.168.34.2 on PE-4, as follows:
[/]
A:admin@PE-3# show router route-table protocol bgp
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
10.1.11.0/24 Remote BGP 00h02m06s 170
192.168.34.2 15
-------------------------------------------------------------------------------
No. of Routes: 1
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
This is summarized in Optimal route reflection.
The following command provides the IGP distances for the configured reference points to all available BGP peers and all detected BGP next hops on the route reflector.
[/]
A:admin@RR-5# show router bgp optimal-route-reflection bgp-nh-info
===============================================================================
ORR BGP-NH Table (Router: Base)
===============================================================================
Location 1:
Primary : 192.0.2.1 [active]
Secondary : -
Tertiary : -
Primary-ipv6 : -
Secondary-ipv6 : -
Tertiary-ipv6 : -
Location 2:
Primary : 192.0.2.44 [active]
Secondary : 192.0.2.33
Tertiary : -
Primary-ipv6 : -
Secondary-ipv6 : -
Tertiary-ipv6 : -
Age : 00h02m55s
Spf wait : 1
Initial wait : 1
Second wait : 1
-------------------------------------------------------------------------------
Next Hop
Loc Dest-Prefix
DB-Source Type Proto Metric Pref
-------------------------------------------------------------------------------
192.0.2.1
1 192.0.2.1/32
IGP Local Local 0 0
2 192.0.2.1/32
IGP Remote ISIS 35 18
192.0.2.4
1 192.0.2.4/32
IGP Remote ISIS 35 18
2 192.0.2.4/32
IGP Local Local 0 0
-------------------------------------------------------------------------------
No. of BGP-NHs: 2
===============================================================================
Conclusion
BGP optimal route reflection allows operators to optimize traffic streams through their network, even when the route reflector is placed out-of-path, for example in datacenters, thereby reducing the OPEX and CAPEX of route reflector deployment.