BGP Segment Routing Using the Prefix SID Attribute
This chapter describes BGP Segment Routing using the prefix SID attribute.
Topics in this chapter include:
Applicability
The information and configuration in this chapter are based on SR OS Release 23.3.R1. BGP Segment Routing (SR) is supported in SR OS Release 19.10.R1, and later.
Overview
Segment Routing (SR) has become a foundational technology for Software-Defined Networking (SDN) in Wide Area Networks (WANs). Also, SR is being extended beyond WAN borders into Data Centers (DCs).
SR allows an ingress node to route a packet from the source, by prepending an SR header containing an ordered list of segment identifiers (SIDs). A SID represents a topological or service-based instruction. A SID can have a local meaning for one specific node, or a global meaning within the SR domain, such as the instruction to forward a packet on the Equal-Cost Multipath (ECMP) aware shortest path to reach some prefix.
In WAN networks, infrastructure IP reachability is nearly always conveyed by an IGP protocol, such as OSPF and IS-IS, but in large-scale DCs, BGP has become the protocol of choice. In a typical DC design, BGP is used for endpoint reachability, as follows:
Each node (Top of Rack (TOR), leaf, spine, and so on) has its own Autonomous System (AS).
Each node has an eBGP session to each of its directly connected peers.
Each node originates the IPv4 (or IPv6) address of its loopback interface into BGP and announces it to its neighbors.
To extend SR-MPLS into DCs that use this type of BGP design, the SR OS nodes must advertise their loopback IP prefix in a BGP labeled-unicast (BGP-LU) IPv4 route with a prefix SID attribute. The prefix SID attribute is ignored when attached to other types of BGP routes, including BGP-LU IPv6 routes, but it is still be propagated.
A BGP prefix SID is always a global SID within the SR domain and identifies an instruction to forward the packet along the ECMP-aware BGP-computed best paths to reach the prefix. The BGP prefix SID attribute can also help to create SR paths that transit across multiple administrative domains that do not share IGP SR topology information.
BGP-LU IPv4 route with prefix SID BGP path attribute shows a node in AS 64501 advertising a BGP-LU IPv4 route for prefix 10.0.0.1/32 with SID 20101. The SR-capable nodes forward packets with SID 20101 via the best BGP path to 10.0.0.1, using any of the available multipaths computed by BGP.
The BGP prefix SID attribute with type code 40 is an optional and transitive BGP path attribute, meaning that the attribute is expected to be propagated by routers that do not recognize the type value. When SR is deployed using an MPLS dataplane (SR-MPLS), the BGP prefix SID encodes:
A 32-bit label-index Type-Length-Value (TLV) (mandatory TLV)
An originator Segment Routing Global Block (SRGB) TLV containing one or more SRGB fields (optional TLV). If the SRGB field occurs multiple times in the SRGB TLV, the SRGB space of the ingress node consists of multiple ranges that are concatenated.
BGP signaling overview shows that node PE-1 exports a BGP-LU IPv4 route with prefix 10.0.0.1/32 and label 20101. The BGP prefix SID attribute is attribute type 40 and contains an SR label index of 1 and the originator SRGB with start label 20100 and size 100 (from 20100 to 20199). Node PE-2 imports the BGP-LU IPv4 route and exports it to the next node.
To add, replace, or process a BGP prefix SID, SR must be administratively enabled in the bgp context. The BGP prefix SID range can be set to either global (that is, equal to the SRGB also used by SR-OSPF or SR-ISIS and defined in the router "Base" mpls-labels sr-labels context) or a subset of the SRGB defined by the start-label command in combination with max-index. All BGP prefix SID values must reside within the global SRGB or the start-label command fails. The prefix-sid-range is a mandatory requirement.
To originate BGP SR prefixes, two policies are required with an sr-label-index action, which may or may not be identical:
route-table-import policy-name <policy-name> used to populate a local BGP-SR table with an SR label index
export policy [<policy>] to advertise a prefix to a neighbor with an SR label index
In the example topology used in this chapter, the import and export policies are identical and have an action entry with action-type accept with sr-label-index with value 1, so on PE-1, the prefix SID for the prefix 10.0.0.1/32 equals 20101, which is the sum of the start label for the prefix SID range 20100 and the SR label index 1.
A unique label index value must be assigned to each different IPv4 prefix that is advertised with a BGP prefix SID. However, in case of a conflict with another SR-programmed Label Forwarding Instance Base (LFIB) entry, the conflict situation is addressed as follows:
If the conflict is with another BGP-LU IPv4 route for a different prefix with a prefix SID attribute, all the conflicting BGP-LU IPv4 routes for both prefixes are advertised with normal BGP-LU labels from the dynamic label range, not from the dedicated SR label range.
If the conflict is with an IGP route and the route-table-import policy action does not contain the prefer-igp in the sr-label-index command, the BGP-LU IPv4 route loses to the IGP route and is advertised with a normal BGP-LU label from the dynamic SR label range.
If the conflict is with an IGP route and the route-table-import policy action contains the prefer-igp in the sr-label-index command, this is not considered a conflict and BGP uses the IGP-signaled label index to derive its advertised label. This stitches the BGP SR tunnel to the IGP SR tunnel.
Stitching of SR-ISIS or SR-OSPF to SR-BGP is one of the main advantages of implementing SR-BGP.
Any /32 BGP-LU IPv4 route containing a prefix SID attribute is resolvable and usable in the same way as /32 BGP-LU IPv4 routes without prefix SID attribute. The routes can be installed in the route table and tunnel table, have ECMP next hops or FRR backup next hops, and can be used as transport tunnels.
Receiving a /32 BGP-LU IPv4 route with prefix SID attribute does not create a tunnel in the SR database; it only creates a label swap entry when the route is re-advertised with a new next hop. This means that the first SID in any SID list of an SR policy should not be based on a BGP prefix SID because the data path would not be programmed correctly. However, the BGP prefix SID can be used as a non-first SID in any SR policy.
Each node capable of receiving and propagating the BGP prefix SID attribute can be configured with the block-prefix-sid command at the BGP global, group, or neighbor configuration levels to:
block the propagation of the attribute outside its local SR domain
block inbound propagation of the attribute from another SR domain
When block-prefix-sid applies to a BGP session, the prefix SID attribute is stripped from all sent and received routes on that session, even if the prefix SID attribute was added to the outbound routes by the local router. By default, this feature is not configured, so the prefix SID is propagated freely to and from all BGP peers.
Configuration
Example topology shows the example topology with four nodes in different ASs. The loopback addresses 10.0.0.1/32 on PE-1 and 10.0.0.4/32 on PE-4 are exported in BGP-LU IPv4 routes with prefix SID attribute.
The initial configuration includes:
Cards, MDAs, ports
Router interfaces
eBGP sessions for the label-IPv4 address family
PE-3 and PE-4 have ecmp and multipath max-paths set to 2 for BGP address family label-ipv4
No IGP is configured, so SR-OSPF or SR-ISIS cannot be used.
Configure BGP segment routing using prefix SID
BGP SR is enabled on all PEs. Also, the SRGB is configured and the BGP SR labels are defined as a subset of the SRGB, as follows:
# on PE-1, PE-2, PE-3, PE-4:
configure exclusive
router "Base" {
mpls-labels {
sr-labels {
start 20000
end 20999
}
}
bgp {
segment-routing {
admin-state enable
prefix-sid-range {
start-label 20100
max-index 99
}
}
}
It is possible to define different policies with the sr-label-index action for importing and exporting the prefixes, but in this example, the same policy is used. The following policy is used for exporting and importing prefix 10.0.0.1/32 on PE-1:
# on PE-1:
configure exclusive
policy-options {
prefix-list "10.0.0.1/32" {
prefix 10.0.0.1/32 type exact {
}
}
policy-statement "prefix-sid-1" {
entry 10 {
from {
prefix-list ["10.0.0.1/32"]
}
action {
action-type accept
sr-label-index {
value 1
}
}
}
}
}
Likewise, PE-4 exports prefix 10.0.0.4/32 with SR label index value 4, resulting in a BGP prefix SID 20104 (start label 20100 + index 4 = 20104).
The route-table-import policy-name command is used to populate a local BGP-SR table with SR label 20101 (20100 + 1 = 20101), as follows:
# on PE-1:
configure exclusive
router "Base" {
bgp {
rib-management {
label-ipv4 {
route-table-import {
policy-name "prefix-sid-1"
}
}
}
}
The export policy is configured in the BGP group, as follows:
# on PE-1:
configure exclusive
router "Base" {
bgp {
group "eBGP" {
family {
label-ipv4 true
}
}
neighbor "192.168.12.2" {
group "eBGP"
peer-as 64502
}
export {
policy ["prefix-sid-1"]
}
}
The following show commands display the BGP-SR table on the different PEs:
[/]
A:admin@PE-1# show router bgp sr-label
===============================================================================
BGP SR labels
Flags: B - entry has backup next-hop, E - entry has ECMP next-hops
===============================================================================
Prefix Advertised Received Flags
Label Label
-------------------------------------------------------------------------------
10.0.0.1/32 20101 - -
10.0.0.4/32 20104 20104 -
-------------------------------------------------------------------------------
Total Labels allocated: 2
===============================================================================
[/]
A:admin@PE-2# show router bgp sr-label
===============================================================================
BGP SR labels
Flags: B - entry has backup next-hop, E - entry has ECMP next-hops
===============================================================================
Prefix Advertised Received Flags
Label Label
-------------------------------------------------------------------------------
10.0.0.1/32 20101 20101 -
10.0.0.4/32 20104 20104 -
-------------------------------------------------------------------------------
Total Labels allocated: 2
===============================================================================
[/]
A:admin@PE-3# show router bgp sr-label
===============================================================================
BGP SR labels
Flags: B - entry has backup next-hop, E - entry has ECMP next-hops
===============================================================================
Prefix Advertised Received Flags
Label Label
-------------------------------------------------------------------------------
10.0.0.1/32 20101 20101 -
10.0.0.4/32 20104 20104 E
-------------------------------------------------------------------------------
Total Labels allocated: 2
===============================================================================
[/]
A:admin@PE-4# show router bgp sr-label
===============================================================================
BGP SR labels
Flags: B - entry has backup next-hop, E - entry has ECMP next-hops
===============================================================================
Prefix Advertised Received Flags
Label Label
-------------------------------------------------------------------------------
10.0.0.1/32 20101 20101 E
10.0.0.4/32 20104 - -
-------------------------------------------------------------------------------
Total Labels allocated: 2
===============================================================================
Because PE-3 and PE-4 have ECMP and BGP multipath configured, traffic flows can be sprayed over two links. The E-flag in the last column indicates that an ECMP next-hop is available for prefix 10.0.0.4/32 on PE-3 and for prefix 10.0.0.1 on PE-4.
The tunnel table on PE-1 shows that a tunnel with ID 262145 is available toward destination 10.0.0.4/32:
[/]
A:admin@PE-1# show router tunnel-table
===============================================================================
IPv4 Tunnel Table (Router: Base)
===============================================================================
Destination Owner Encap TunnelId Pref Nexthop Metric
Color
-------------------------------------------------------------------------------
10.0.0.4/32 bgp MPLS 262145 12 192.168.12.2 1000
-------------------------------------------------------------------------------
Flags: B = BGP or MPLS backup hop available
L = Loop-Free Alternate (LFA) hop available
E = Inactive best-external BGP route
k = RIB-API or Forwarding Policy backup hop
===============================================================================
The FP-tunnel table provides more information about the label (20104) and next hop (192.168.12.2):
[/]
A:admin@PE-1# show router fp-tunnel-table 1
===============================================================================
IPv4 Tunnel Table Display
Legend:
label stack is ordered from bottom-most to top-most
B - FRR Backup
===============================================================================
Destination Protocol Tunnel-ID
Lbl/SID
NextHop Intf/Tunnel
Lbl/SID (backup)
NextHop (backup)
-------------------------------------------------------------------------------
10.0.0.4/32 BGP -
20104
192.168.12.2 1/1/c1/1:100
-------------------------------------------------------------------------------
Total Entries : 1
-------------------------------------------------------------------------------
===============================================================================
On PE-2, two tunnels are available: one toward destination 10.0.0.1/32 with SR label 20101 and another toward destination 10.0.0.4/32 with SR label 20104:
[/]
A:admin@PE-2# show router fp-tunnel-table 1
===============================================================================
IPv4 Tunnel Table Display
Legend:
label stack is ordered from bottom-most to top-most
B - FRR Backup
===============================================================================
Destination Protocol Tunnel-ID
Lbl/SID
NextHop Intf/Tunnel
Lbl/SID (backup)
NextHop (backup)
-------------------------------------------------------------------------------
10.0.0.1/32 BGP -
20101
192.168.12.1 1/1/c1/2:100
10.0.0.4/32 BGP -
20104
192.168.23.2 1/1/c1/1:100
-------------------------------------------------------------------------------
Total Entries : 2
-------------------------------------------------------------------------------
===============================================================================
On PE-3, three tunnels are available: one toward destination 10.0.0.1/32 with SR label 20101 and two toward destination 10.0.0.4/32 with SR label 20104.
[/]
A:admin@PE-3# show router fp-tunnel-table 1
===============================================================================
IPv4 Tunnel Table Display
Legend:
label stack is ordered from bottom-most to top-most
B - FRR Backup
===============================================================================
Destination Protocol Tunnel-ID
Lbl/SID
NextHop Intf/Tunnel
Lbl/SID (backup)
NextHop (backup)
-------------------------------------------------------------------------------
10.0.0.1/32 BGP -
20101
192.168.23.1 1/1/c1/2:100
10.0.0.4/32 BGP -
20104
192.168.34.2 1/1/c1/1:100
20104
192.168.34.6 1/1/c1/3:100
-------------------------------------------------------------------------------
Total Entries : 2
-------------------------------------------------------------------------------
===============================================================================
On PE-4, two tunnels are available toward destination 10.0.0.1/32 with SR label 20101:
[/]
A:admin@PE-4# show router fp-tunnel-table 1
===============================================================================
IPv4 Tunnel Table Display
Legend:
label stack is ordered from bottom-most to top-most
B - FRR Backup
===============================================================================
Destination Protocol Tunnel-ID
Lbl/SID
NextHop Intf/Tunnel
Lbl/SID (backup)
NextHop (backup)
-------------------------------------------------------------------------------
10.0.0.1/32 BGP -
20101
192.168.34.1 1/1/c1/2:100
20101
192.168.34.5 1/1/c1/3:100
-------------------------------------------------------------------------------
Total Entries : 1
-------------------------------------------------------------------------------
===============================================================================
PE-1 advertised a BGP-LU IPv4 route for prefix 10.0.0.1/32 with label 20101 to PE-2. The following command on PE-2 shows the received route:
[/]
A:admin@PE-2# show router bgp routes 10.0.0.1/32 label-ipv4
===============================================================================
BGP Router ID:192.0.2.2 AS:64502 Local AS:64502
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP LABEL-IPV4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.0.0.1/32 None None
192.168.12.1 None 0
64501 20101
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
This route is advertised to PE-3 and finally to PE-4. The following command on PE-4 shows two BGP-LU IPv4 routes for prefix 10.0.0.1/32 with label 20101: one with next hop 192.168.34.1 and another one with next hop 192.168.34.5.
[/]
A:admin@PE-4# show router bgp routes 10.0.0.1/32 label-ipv4
===============================================================================
BGP Router ID:192.0.2.4 AS:64504 Local AS:64504
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP LABEL-IPV4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 10.0.0.1/32 None None
192.168.34.1 None 0
64503 64502 64501 20101
u*>i 10.0.0.1/32 None None
192.168.34.5 None 0
64503 64502 64501 20101
-------------------------------------------------------------------------------
Routes : 2
===============================================================================
The detailed output for the BGP-LU IPv4 routes on PE-4 show the prefix SID attribute with index 1 and originator SRGB with start label 20100 and size 100, as follows:
[/]
A:admin@PE-4# show router bgp routes 10.0.0.1/32 label-ipv4 detail
===============================================================================
BGP Router ID:192.0.2.4 AS:64504 Local AS:64504
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP LABEL-IPV4 Routes
===============================================================================
Original Attributes
Network : 10.0.0.1/32
Nexthop : 192.168.34.1
Path Id : None
From : 192.168.34.1
Res. Nexthop : 192.168.34.1
Local Pref. : n/a Interface Name : int-PE-4-PE-3
Aggregator AS : None Aggregator : None
Atomic Aggr. : Not Atomic MED : None
AIGP Metric : None IGP Cost : 0
Connector : None
Community : No Community Members
Cluster : No Cluster Members
Originator Id : None Peer Router Id : 192.0.2.3
Fwd Class : None Priority : None
IPv4 Label : 20101
Flags : Used Valid Best IGP In-TTM In-RTM
Route Source : External
AS-Path : 64503 64502 64501
Route Tag : 0
Neighbor-AS : 64503
DB Orig Val : NotFound Final Orig Val : N/A
Source Class : 0 Dest Class : 0
Add Paths Send : Default
RIB Priority : Normal
Last Modified : 00h01m18s
Prefix SID : index 1, originator-srgb [20100/100]
---snip---
-------------------------------------------------------------------------------
Routes : 2
===============================================================================
===============================================
The following debug message shows how the prefix SID attribute is advertised in a BGP update:
18 2023/04/17 18:16:28.069 CEST MINOR: DEBUG #2001 Base Peer 1: 192.168.34.1
"Peer 1: 192.168.34.1: UPDATE
Peer 1: 192.168.34.1 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 66
Flag: 0x90 Type: 14 Len: 17 Multiprotocol Reachable NLRI:
Address Family LBL-IPV4
NextHop len 4 NextHop 192.168.34.1
10.0.0.1/32 Label 20101
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 14 AS Path:
Type: 2 Len: 3 < 64503 64502 64501 >
Flag: 0xc0 Type: 40 Len: 21 Prefix-SID-attr:
Label Index TLV (10 bytes):-
flags: 0x0 label Index: 1
Originator SRGB TLV (11 bytes):-
flags: 0x0 start_label: 20100 num_label: 100
"
Configure VPRN
Example topology with VPRN 1 shows the example topology with a basic VPRN service to demonstrate the end-to-end control plane signaling and data plane verification.
A BGP multi-hop session for address family VPN-IPv4 is configured between the GRT loopback addresses 10.0.0.1/32 on PE-1 and 10.0.0.4/32 on PE-4. On PE-1, the additional BGP configuration is as follows:
# on PE-1:
configure exclusive
router "Base" {
bgp {
group "eBGP-VPN" {
family {
vpn-ipv4 true
}
}
neighbor "10.0.0.4" {
group "eBGP-VPN"
multihop 64
local-address 10.0.0.1
peer-as 64504
}
In addition, the VPRN 1 service has loopback addresses 192.168.1.1/32 on PE-1 and 192.168.1.4/32 on PE-4. The configuration on PE-1 is as follows:
# on PE-1:
configure exclusive
service {
vprn "VPRN 1" {
admin-state enable
service-id 1
customer "1"
bgp-ipvpn {
mpls {
admin-state enable
route-distinguisher "1:1"
vrf-target {
community "target:1:1"
}
auto-bind-tunnel {
resolution any
}
}
}
interface "lo1" {
loopback true
ipv4 {
primary {
address 192.168.1.1
prefix-length 32
}
}
}
}
The configuration on PE-4 is similar.
The following VPN-IPv4 route is received on PE-1:
[/]
A:admin@PE-1# show router bgp routes vpn-ipv4
===============================================================================
BGP Router ID:192.0.2.1 AS:64501 Local AS:64501
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP VPN-IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id IGP Cost
As-Path Label
-------------------------------------------------------------------------------
u*>i 4:1:192.168.1.4/32 None None
10.0.0.4 None 0
64504 524286
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
The route table for VPRN 1 on PE-1 is as follows:
[/]
A:admin@PE-1# show router 1 route-table
===============================================================================
Route Table (Service: 1)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
192.168.1.1/32 Local Local 00h01m31s 0
lo1 0
192.168.1.4/32 Remote BGP VPN 00h01m14s 170
10.0.0.4 (tunneled:BGP) 1000
-------------------------------------------------------------------------------
No. of Routes: 2
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
Conclusion
With BGP SR, it is possible to use SR without the use of an IGP protocol (for example, to cross AS boundaries). It is also possible to stitch SR-IGP and SR-BGP tunnels together. BGP SR uses the prefix SID attribute.