Layer 2 Multicast Optimization for EVPN-VXLAN — Assisted Replication
This chapter provides information about Layer 2 Multicast Optimization for EVPN-VXLAN — Assisted Replication.
Topics in this chapter include:
Applicability
This chapter was initially written for SR OS Release 14.0.R4, but the CLI in the current edition is based on SR OS Release 23.3.R3. Layer 2 multicast optimization for EVPN-VXLAN - Assisted Replication (AR) is supported in SR OS Release 14.0.R4, and later.
Overview
Typically, EVPN-VXLAN can use either Ingress Replication (IR) or Protocol Independent Multicast (PIM) for Broadcast, Unknown unicast, and Multicast (BUM) traffic (although SR OS does not support PIM along with EVPN-VXLAN). PIM requires keeping multicast state awareness per subnet per tenant in the core routers, which may not scale. Not all core routers support PIM.
IR inefficiency is usually tolerable in EVPN networks for broadcast and unknown unicast traffic; however, it is not tolerable for multicast traffic:
Broadcast traffic can be reduced by the proxy-ARP and proxy-ND capabilities supported by EVPN.
Unknown unicast traffic is greatly reduced in virtualized Data Center (DC) networks where all MAC and IP addresses are learned in the control or management planes. In such cases, unknown MAC addresses are always outside the DC. An unknown-mac-route can be enabled to ensure that the unknown unicast traffic is sent only to the DC gateway, which minimizes flooding within the DC.
Multicast traffic may be an issue for the hypervisors holding the multicast sources, because the hypervisors need to replicate the multicast traffic to the remote VXLAN Tunnel Endpoints (VTEPs). The multicast replication at the hypervisors is a software process and the throughput can be heavily impacted. This is also true when VPLS services are used in the Virtual Service Router (VSR) and many replicas must be done from the VSR. Using a dedicated service node to replicate the multicast traffic on behalf of the hypervisors can help, but the replication capabilities of such service nodes are limited too.
SR OS supports the Assisted Replication (AR) feature for IPv4 VXLAN tunnels (both replicator and leaf functions) in compliance with the non-selective mode described in draft-ietf-bess-evpn-optimized-ir. AR is a Layer 2 multicast optimization feature that helps software-based PEs and Network Virtualization Edge (NVE) devices with low-performance replication capabilities to deliver Broadcast and Multicast (BM) Layer 2 traffic to remote VTEPs in the VPLS.
SR OS nodes support the AR-Replicator (AR-R) and AR-Leaf (AR-L) functions, although not simultaneously on the same service. Nodes configured as AR-L select an AR-R within a service and send all BM packets to this AR-R. AR-Rs replicate traffic to all the VTEPs in the VPLS on behalf of the AR-Ls, so BM traffic is delivered to all VPLS participants without any packet loss caused by performance issues. Unknown unicast packets follow the same path as known unicast packets to avoid packet reordering. Therefore, no AR-R is used for unknown unicast traffic.
When multiple AR-Rs exist in a service, the AR-L performs per-service load-balancing of the BM traffic. The AR-L lists the candidate AR-Rs, ordered by IP address and VXLAN Network Identifier (VNI); candidate 0 having the lowest IP address and VNI. The replicator is selected using a modulo function of the service ID and the number of candidate AR-Rs. For example, assume that VPLS 1 has two candidate AR-Rs: because 1 modulo 2 equals 1, the second AR-R in the list is selected. In case of failure, a new AR-R is selected. If there are no more AR-Rs, the system falls back to IR.
PMSI Tunnel Attribute - Flags shows an EVPN route-type 3, an Inclusive Multicast Ethernet Tag (IMET) route containing a PMSI tunnel attribute with a flags octet. Flag L was already defined in RFC 6514. Draft-ietf-bess-evpn-optimized-ir defines additional flags: type, BM, and U. The BM and U flags are used for Pruned Flood Lists (PFL) signaling and they are not supported.
The type field has two bits that define the AR role of the advertising router, as follows:
Type 00 = Regular Network Virtualization Edge (RNVE) - indicates that AR is not supported and IR is applied instead (for backward compatibility)
Type 01= AR-R
Type 10 = AR-L
Type 11 = reserved
The tunnel type in the PMSI tunnel attribute can be configured with the following options for IR and AR:
Tunnel type 0x06 = (non-optimized) IR, sent by AR-R and AR-L if ingress-repl-inc-mcast-advertisement is enabled, which is the default option
Tunnel type 0x0A = type AR, originated by AR-R
For regular IR routes, the originating router's IP address equals the system IP address. The MPLS label and tunnel identifier must be used as described in RFC 7432. The tunnel identifier is set to a routable address of the PE.
For AR routes, the originating router's IP address and the tunnel identifier are both set to the AR IP address (AR-IP) configured in the service system vxlan context. The AR-IP must be previously defined as a loopback interface address in the base router and must be different from the IR IP address (IR-IP).
If the AR-IP loopback interface is down, the router does not withdraw the AR route. However, the remote AR-Ls is not able to resolve the AR route's BGP next-hop if the AR-IP is no longer propagated in the IGP.
EVPN Assisted Replication for VXLAN shows the example topology with the multicast source connected to a hypervisor PE-3 that acts as AR-L, which sends an IR route containing the system address of PE-3. The AR-R PE-1 sends an AR route that uses AR-IPs instead of IR-IPs; for example, PE-1 has AR-IP 1.1.1.1 and IR-IP 192.0.2.1.
Hypervisor PE-3 sends the BM traffic to the AR-R, which replicates it to all the VTEPs in the VPLS, except to PE-3.
Inclusive multicast route information sent by different AR roles shows the inclusive multicast route information sent by each role in an AR-capable service.
AR role |
function |
inclusive multicast route advertised |
---|---|---|
AR-R |
assists AR-Ls |
IR inclusive multicast route (tunnel = 0x06 = IR, IR-IP, type = 0 = none) AR inclusive multicast route (tunnel = 0x0A = AR, AR-IP, type = 1 = AR-R) |
AR-L |
sends BM only to AR-R |
IR inclusive multicast route (tunnel = 0x06 = IR, IR- IP, type = 2 = AR-L) |
RNVE |
non-AR support |
IR inclusive multicast route (tunnel = 0x06 = IR, IR- IP, type = 0 = none) |
Unicast traffic (known or unknown) is processed as normal. For BM traffic, the AR-R uses AR or IR based on the IP destination address (DA):
If IP DA equals the AR-IP, the AR-R replicates to the VTEPs in the VXLAN service, except for the VTEP over which the BM traffic was received.
If IP DA equals the IR-IP, normal IR forwarding is done.
Non-optimized-IR nodes are unaware of the PMSI tunnel attribute flag definition with the additional flags for AR, so they ignore the information in the flags field.
The draft-ietf-bess-evpn-optimized-ir describes the following three types of IR optimizations:
Non-selective AR - the chosen AR-R replicates the BM traffic to all NVEs in the Ethernet VPN Instance (EVI) except for the source NVE.
Selective AR - AR-Rs replicate BM traffic to only their AR-L set and the rest of the AR-Rs. Selective AR allows a "multi-stage" AR replication, as opposed to a "single-stage" AR replication.
Pruned Flood Lists - AR-Ls can signal PFL flags to be pruned from the flood lists for BM or for unknown unicast traffic. PFL may be used in combination with AR.
This chapter only describes non-selective AR.
Configure AR-R and AR-L
The AR-IP is configured on the AR-R, as follows:
configure {
service {
system {
vxlan {
assisted-replication {
ip-address ?
ip-address <unicast-ipv4-address>
<unicast-ipv4-address> - <d.d.d.d>
IP address for assisted replication in the router
The AR-IP is the IPv4 address of a loopback interface in the base router instance. When attempting to configure an AR-IP and the loopback address does not exist, the following error message is raised:
configure {
service {
system {
vxlan {
assisted-replication {
ip-address 1.1.1.1
MINOR: MGMT_CORE #4001: configure service system vxlan assisted-replication ip-address
- loopback interface with address (max prefix) needed for assisted replication
- configure router "Base"
The AR types replicator and leaf are configured in a VPLS with the following command:
configure {
service {
vpls "VPLS 10" {
vxlan {
instance 1 {
vni 1
assisted-replication ?
assisted-replication
Choice: role
leaf :+ Enter the leaf context
replicator :- AR role as replicator
When attempting to configure an AR-R before the AR-IP is set, the following error is raised:
configure {
service {
vpls "VPLS 10" {
customer "1"
service-id 10
vxlan {
instance 1 {
vni 1
assisted-replication {
replicator
MINOR: MGMT_CORE #4001: configure service vpls "VPLS 10" vxlan instance 1 assisted-replication replicator
- assisted-replication ip address needed for replication role
- configure service system vxlan assisted-replication ip-address
The assisted-replication-time can only be configured on leaf nodes. The following error is raised after an attempt to configure the assisted-replication-time on an AR-R:
configure {
service {
vpls "VPLS 10" {
vxlan {
instance 1 {
assisted-replication {
replicator {
acttime 5
MINOR: MGMT_CORE #2201: Unknown element - 'acttime'
The acttime can optionally be activated, and works as follows. When the router creates an AR-R destination for the first time, the assisted replication time must expire before this AR-R destination is eligible as candidate AR-R to forward BM traffic. Upon expiration, the router runs the AR-R selection (service ID modulo the number of AR-Rs provides the selected AR-R in the ordered list of candidate AR-Rs). The AR-R EVPN destination is created as "BM" and the destinations to the remaining nodes is shown as "U".
The acttime allows the AR-R some time to program the leaf VTEPs in the following cases:
Configuration of a new AR-R
AR-R rebooting
AR-R going operationally down and up again
If the timer is zero (default value), the AR-R may receive packets from a VTEP that has not been programmed yet, in which case the AR-R drops the packets.
With the AR-Rs and AR-Ls configured, IMET AR routes can be exchanged. IR can be enabled or disabled independently of the AR configuration. The following command is required to enable IR inclusive multicast routes, and is enabled by default:
configure {
service {
vpls "VPLS 10" {
bgp-evpn {
routes {
incl-mcast {
advertise-ingress-replication true
BGP-EVPN routes
By default, IR is enabled in BGP-EVPN. The following IMET IR route is sent from PE-5 (RNVE) to Route Reflector (RR) PE-1. The flags in the PMSI Tunnel Attribute (PTA) indicate that regular IR is used to forward BUM traffic (tunnel type: 0x06). The AR type is "None", because AR is disabled on PE-5. The IR-IP 192.0.2.5 is used as next-hop, originator IP address, and tunnel endpoint. The MPLS label corresponds to the VNI.
A:admin@PE-5# //
A:PE-5# show debug
debug
router "Base"
bgp
update
exit
exit
exit
A:PE-5# //
On PE-5:
14 2023/07/12 10:59:55.416 CEST MINOR: DEBUG #2001 Base Peer 1: 192.0.2.1
"Peer 1: 192.0.2.1: UPDATE
Peer 1: 192.0.2.1 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 77
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 192.0.2.5
Type: EVPN-INCL-MCAST Len: 17 RD: 192.0.2.5:1, tag: 0, orig_addr len: 32, orig_addr: 192.0.2.5
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:1
bgp-tunnel-encap:VXLAN
Flag: 0xc0 Type: 22 Len: 9 PMSI:
Tunnel-type Ingress Replication (6)
Flags: (0x0)[Type: None BM: 0 U: 0 Leaf: not required]
MPLS Label 1
Tunnel-Endpoint 192.0.2.5
"
A similar IMET IR route is sent from AR-L PE-3 toward RR PE-1, as follows. The difference is that the flags indicate that PE-3 is configured as an AR-L for the VPLS. The IR-IP 192.0.2.3 is used as next-hop, originator address, and tunnel endpoint.
On PE-3:
10 2023/07/12 10:58:29.634 CEST MINOR: DEBUG #2001 Base Peer 1: 192.0.2.1
"Peer 1: 192.0.2.1: UPDATE
Peer 1: 192.0.2.1 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 77
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 192.0.2.3
Type: EVPN-INCL-MCAST Len: 17 RD: 192.0.2.3:1, tag: 0, orig_addr len: 32, orig_addr: 192.0.2.3
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:1
bgp-tunnel-encap:VXLAN
Flag: 0xc0 Type: 22 Len: 9 PMSI:
Tunnel-type Ingress Replication (6)
Flags: (0x10)[Type: AR Leaf BM: 0 U: 0 Leaf: not required]
MPLS Label 1
Tunnel-Endpoint 192.0.2.3
"
The IMET IR routes contain the system IP addresses of the nodes, not the AR-IPs.
The following AR route is advertised from AR-R PE-1. The tunnel type is AR and the flags indicate that PE-1 is configured as AR-R. The AR-IP 1.1.1.1 is the next-hop address, the originator address, and the tunnel endpoint.
On PE-1:
4 2023/07/12 10:55:15.069 CEST MINOR: DEBUG #2001 Base Peer 1: 192.0.2.4
"Peer 1: 192.0.2.4: UPDATE
Peer 1: 192.0.2.4 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 77
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 1.1.1.1
Type: EVPN-INCL-MCAST Len: 17 RD: 192.0.2.1:1, tag: 0, orig_addr len: 32, orig_addr: 1.1.1.1
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:1
bgp-tunnel-encap:VXLAN
Flag: 0xc0 Type: 22 Len: 9 PMSI:
Tunnel-type Assisted Replication (10)
Flags: (0x8)[Type: AR Replicator BM: 0 U: 0 Leaf: not required]
MPLS Label 1
Tunnel-Endpoint 1.1.1.1
"
Besides IMET AR routes, PE-1 may also advertise IMET IR routes to the other nodes using IR-IP 192.0.2.1 (system IP address). By default, BGP-EVPN has IR enabled. For example, the following IMET IR route is advertised to PE-4:
On PE-1:
3 2023/07/12 10:55:15.069 CEST MINOR: DEBUG #2001 Base Peer 1: 192.0.2.4
"Peer 1: 192.0.2.4: UPDATE
Peer 1: 192.0.2.4 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 77
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 192.0.2.1
Type: EVPN-INCL-MCAST Len: 17 RD: 192.0.2.1:1, tag: 0, orig_addr len: 32, orig_addr: 192.0.2.1
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:1
bgp-tunnel-encap:VXLAN
Flag: 0xc0 Type: 22 Len: 9 PMSI:
Tunnel-type Ingress Replication (6)
Flags: (0x0)[Type: None BM: 0 U: 0 Leaf: not required]
MPLS Label 1
Tunnel-Endpoint 192.0.2.1
"
The following IMET routes have been received by PE-4:
[/]
A:admin@PE-4# show router bgp routes evpn incl-mcast
===============================================================================
BGP Router ID:192.0.2.4 AS:64500 Local AS:64500
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP EVPN Inclusive-Mcast Routes
===============================================================================
Flag Route Dist. OrigAddr
Tag NextHop
-------------------------------------------------------------------------------
u*>i 192.0.2.1:1 1.1.1.1
0 1.1.1.1
u*>i 192.0.2.1:1 192.0.2.1
0 192.0.2.1
u*>i 192.0.2.2:1 2.2.2.2
0 2.2.2.2
u*>i 192.0.2.2:1 192.0.2.2
0 192.0.2.2
u*>i 192.0.2.3:1 192.0.2.3
0 192.0.2.3
u*>i 192.0.2.5:1 192.0.2.5
0 192.0.2.5
-------------------------------------------------------------------------------
Routes : 6
===============================================================================
Configuration
Example topology shows the example topology with PE-1 and PE-2 as AR-R nodes, PE-3 and PE-4 as AR-L nodes, and PE-5 as RNVE node. The multicast source is connected to PE-3, which is a low-performance node. PE-1 acts as an RR for all nodes.
The initial configuration on the nodes includes:
Cards, MDAs, ports
Router interfaces between the nodes
IS-IS as IGP (alternatively, OSPF can be used)
BGP is configured for address family EVPN with RR PE-1. The BGP configuration on PE-1 is as follows:
On PE-1:
configure {
router "Base" {
autonomous-system 64500
bgp {
vpn-apply-export true
vpn-apply-import true
rapid-withdrawal true
split-horizon true
ebgp-default-reject-policy {
import false
export false
}
rapid-update {
evpn true
}
group "DC" {
peer-as 64500
family {
evpn true
}
cluster {
cluster-id 192.0.2.1
}
}
neighbor "192.0.2.2" {
group "DC"
}
neighbor "192.0.2.3" {
group "DC"
}
neighbor "192.0.2.4" {
group "DC"
}
neighbor "192.0.2.5" {
group "DC"
}
}
The BGP configuration on the other nodes is as follows:
On the other PEs:
configure {
router "Base" {
autonomous-system 64500
bgp {
vpn-apply-export true
vpn-apply-import true
rapid-withdrawal true
split-horizon true
ebgp-default-reject-policy {
import false
export false
}
rapid-update {
evpn true
}
group "DC" {
peer-as 64500
family {
evpn true
}
}
neighbor "192.0.2.1" {
group "DC"
}
}
VPLS 10 is configured on all nodes. PE-1 is configured as AR-R with AR-IP 1.1.1.1, which must be configured as loopback IPv4 address in the base router and as AR-IP that can be shared between services. When attempting to configure an AR-IP with an IP address that does not exist in the base router, the following error is raised:
configure {
service {
system {
vxlan {
assisted-replication {
ip-address 1.1.1.1
MINOR: MGMT_CORE #4001: configure service system vxlan assisted-replication ip-address
- loopback interface with address (max prefix) needed for assisted replication
- configure router "Base"
First, a loopback interface is configured in the base router. The IP address needs to be routable and, in this example, an export policy exporting this IP address is configured in IS-IS. Alternatively, a static route can be configured or an additional IS-IS passive interface can be configured for the loopback interface. The IP address is then configured as AR-IP in the service system vxlan context. PE-1 is configured as AR-R for VPLS 10, as follows:
On PE-1:
configure {
policy-options {
prefix-list "AR-IP" {
prefix 1.1.1.1/32 type exact {
}
}
policy-statement "export_AR-IP" {
entry 10 {
from {
prefix-list ["AR-IP"]
}
action {
action-type accept
}
}
}
}
router "Base" {
interface "AR-IP" {
loopback
ipv4 {
primary {
address 1.1.1.1
prefix-length 32
}
}
}
isis 0 {
export-policy ["export_AR-IP"]
}
}
service {
system {
vxlan {
assisted-replication {
ip-address 1.1.1.1
}
}
}
vpls "VPLS 10" {
customer "1"
service-id 10
admin-state enable
vxlan {
instance 1 {
vni 1
assisted-replication {
replicator
}
}
}
bgp 1 {
}
bgp-evpn {
evi 1
vxlan 1 {
admin-state enable
vxlan-instance 1
}
}
}
}
The configuration is similar on PE-2, but with AR-IP 2.2.2.2 instead of 1.1.1.1.
PE-3 and PE-4 are configured as AR-L nodes for VPLS 10. No AR-IP needs to be configured. The configuration of VPLS 10 on PE-3 is as follows:
On PE-3:
configure {
service {
vpls "VPLS 10" {
admin-state enable
service-id 10
customer "1"
vxlan {
instance 1 {
vni 1
assisted-replication {
leaf { }
}
}
}
bgp 1 {
}
bgp-evpn {
evi 1
vxlan 1 {
admin-state enable
vxlan-instance 1
}
}
sap 1/1/c3/1 { # sap for ingress traffic from STC
}
sap 1/2/c1/1:1 { # sap for egress traffic to VPLS 10
}
Multicast traffic enters SAP 1/1/c3/1, whereas receiving hosts can be connected to other SAPs, such as SAP 1/2/c1/1:1. The configuration of VPLS 10 on PE-4 is similar, but no multicast source is connected. When a node is configured as AR-L, optionally the acttime can be configured to define the waiting time before the leaf can begin sending multicast traffic to a new replicator or a replicator that was rebooted. The default is zero seconds, in which case the AR-L starts sending packets to the AR-R without delay. Nokia recommends configuring a acttime value different from zero.
configure {
service {
vpls "VPLS 10" {
vxlan {
instance 1 {
vni 1
assisted-replication {
leaf {
acttime ?
acttime <number>
<number> - <1..255> - seconds
Time for the leaf to wait before sending traffic to a new replicator
PE-5 is configured as an RNVE node for VPLS 10, as follows:
On PE-5:
configure {
service {
vpls "VPLS 10" {
admin-state enable
service-id 10
customer "1"
vxlan {
instance 1 {
vni 1
}
}
bgp 1 {
}
bgp-evpn {
evi 1
vxlan 1 {
admin-state enable
vxlan-instance 1
}
}
sap 1/2/c1/1:1 { # sap for egress traffic to VPLS 10
}
BGP-EVPN IMET routes are exchanged between the nodes. The following IMET routes are used on AR-L PE-3, with two routes from each AR-R: one IR route with BGP next-hop 192.0.2.x and one AR route with BGP next-hop x.x.x.x (with x equal to 1 or 2).
[/]
A:admin@PE-3# show router bgp routes evpn incl-mcast
===============================================================================
BGP Router ID:192.0.2.3 AS:64500 Local AS:64500
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP EVPN Inclusive-Mcast Routes
===============================================================================
Flag Route Dist. OrigAddr
Tag NextHop
-------------------------------------------------------------------------------
u*>i 192.0.2.1:1 1.1.1.1
0 1.1.1.1
u*>i 192.0.2.1:1 192.0.2.1
0 192.0.2.1
u*>i 192.0.2.2:1 2.2.2.2
0 2.2.2.2
u*>i 192.0.2.2:1 192.0.2.2
0 192.0.2.2
u*>i 192.0.2.4:1 192.0.2.4
0 192.0.2.4
u*>i 192.0.2.5:1 192.0.2.5
0 192.0.2.5
-------------------------------------------------------------------------------
Routes : 6
===============================================================================
When the AR-R has no local attachment circuits, such as SAPs or SDP-bindings, it should not generate regular IR routes. This can be controlled by disabling advertise-ingress-replication on PE-1 and PE-2, as follows:
On PE-1 and PE-2:
configure {
service {
vpls "VPLS 10" {
bgp-evpn {
routes {
incl-mcast {
advertise-ingress-replication false
When IR is disabled on the AR-Rs, no IR routes are sent to the other nodes and PE-3 only sees the AR routes from PE-1 and PE-2, as follows:
[/]
A:admin@PE-3# show router bgp routes evpn incl-mcast
===============================================================================
BGP Router ID:192.0.2.3 AS:64500 Local AS:64500
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP EVPN Inclusive-Mcast Routes
===============================================================================
Flag Route Dist. OrigAddr
Tag NextHop
-------------------------------------------------------------------------------
u*>i 192.0.2.1:1 1.1.1.1
0 1.1.1.1
u*>i 192.0.2.2:1 2.2.2.2
0 2.2.2.2
u*>i 192.0.2.4:1 192.0.2.4
0 192.0.2.4
u*>i 192.0.2.5:1 192.0.2.5
0 192.0.2.5
-------------------------------------------------------------------------------
Routes : 4
===============================================================================
The detailed information about the AR route sent by AR-R PE-1 can be shown with the following command. The AR tunnel has endpoint 1.1.1.1.
[/]
A:admin@PE-3# show router bgp routes evpn incl-mcast rd 192.0.2.1:1 hunt
===============================================================================
BGP Router ID:192.0.2.3 AS:64500 Local AS:64500
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP EVPN Inclusive-Mcast Routes
===============================================================================
-------------------------------------------------------------------------------
RIB In Entries
-------------------------------------------------------------------------------
Network : n/a
Nexthop : 1.1.1.1
Path Id : None
From : 192.0.2.1
---snip---
Community : target:64500:1 bgp-tunnel-encap:VXLAN
Cluster : No Cluster Members
Originator Id : None Peer Router Id : 192.0.2.1
Flags : Used Valid Best IGP
Route Source : Internal
AS-Path : No As-Path
EVPN type : INCL-MCAST
Tag : 0
Originator IP : 1.1.1.1
Route Dist. : 192.0.2.1:1
Route Tag : 0
---snip---
-------------------------------------------------------------------------------
PMSI Tunnel Attributes :
Tunnel-type : Assisted Replication
Flags : Type: AR-Replicator(1) BM: 0 U: 0 Leaf: not required
MPLS Label : VNI 1
Tunnel-Endpoint: 1.1.1.1
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
RIB Out Entries
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Routes : 1
===============================================================================
The following command shows the VXLAN destinations for VPLS 10 on PE-3:
[/]
A:admin@PE-3# show service id 10 vxlan destinations
===============================================================================
Egress VTEP, VNI (Instance 1)
===============================================================================
VTEP Address Egress VNI Oper Mcast Num
State MACs
-------------------------------------------------------------------------------
1.1.1.1 1 Up BM 0
2.2.2.2 1 Up - 0
192.0.2.4 1 Up U 0
192.0.2.5 1 Up U 0
-------------------------------------------------------------------------------
Number of Egress VTEP, VNI : 4
-------------------------------------------------------------------------------
---snip---
===============================================================================
PE-3 is configured as AR-L and no acttime is defined (default). Four egress VTEPs are listed: the system IP addresses are used for IR routes and the AR-IPs are used for AR routes. All BM traffic is forwarded to AR-IP 1.1.1.1 on PE-1. The AR-R in use is selected by the modulo operation on the service ID (10). In this example, two AR-Rs are available, and the service ID modulo 2 equals zero: 10 mod 2 = 0. This is the lowest possible outcome, so the first AR-R in the ordered candidate list is used. The AR-Rs are ordered by IP and VNI, with candidate 0 the lowest IP and VNI.
[/]
A:admin@PE-3# show service id 10 vxlan assisted-replication replicator
===============================================================================
Vxlan AR Replicator Candidates
===============================================================================
Inst VTEP Address Egr VNI In Use In Candidate List Pending Time
-------------------------------------------------------------------------------
1 1.1.1.1 1 yes yes 0
1 2.2.2.2 1 no yes 0
-------------------------------------------------------------------------------
Number of entries : 2
-------------------------------------------------------------------------------
===============================================================================
Within a service, no load-sharing is done between the AR-Rs. However, different AR-Rs can be used for different services.
If PE-3 were configured as AR-L in VPLS 11, the calculation would be as follows: 11 mod 2 = 1; therefore, the second AR-R in the list would be selected.
When three AR-Rs were available for VPLS 11, the calculation would be: 11 mod 3 = 2, so the third AR-R in the list would be used.
In case different VNIs are configured for the AR-Rs, the lowest IP address is always higher in the list, even when the VNI is higher. This can be shown when the VPLS VXLAN configuration on PE-1 is modified with VNI 99 instead of VNI 1, as follows:
On PE-1:
configure {
service {
vpls "VPLS 10" {
bgp-evpn {
delete vxlan 1
}
delete vxlan
configure {
service {
vpls "VPLS 10" {
vxlan {
instance 1 {
vni 99
assisted-replication {
replicator
configure {
service {
vpls "VPLS 10" {
bgp-evpn {
vxlan 1 {
admin-state enable
vxlan-instance 1
The list of AR-Rs on PE-3 shows that the first entry is the VTEP with the lowest IP address (1.1.1.1), even though the VNI 99 is higher than 1:
[/]
A:admin@PE-3# show service id 10 vxlan assisted-replication replicator
===============================================================================
Vxlan AR Replicator Candidates
===============================================================================
Inst VTEP Address Egr VNI In Use In Candidate List Pending Time
-------------------------------------------------------------------------------
1 1.1.1.1 99 yes yes 0
1 2.2.2.2 1 no yes 0
-------------------------------------------------------------------------------
Number of entries : 2
-------------------------------------------------------------------------------
===============================================================================
If the AR-IP loopback interface is down, BGP does not withdraw the AR route. When the route to the AR-IP is signaled using IGP, the route is removed from the routing table and the AR-L selects another AR-R. However, when a static route is defined for the AR-IP, a black-hole exists when the AR-IP interface is down.
PE-5 is configured as an RNVE node that signals regular IMET IR routes and is unaware of the AR-R and AR-L roles in the EVI. RNVE nodes ignore IMET AR routes. In the example, only PE-3, PE-4, and PE-5 send IMET IR updates, so the list of VTEP addresses on PE-5 only contains PE-3 and PE-4, as follows:
[/]
A:admin@PE-5# show service id 10 vxlan destinations
===============================================================================
Egress VTEP, VNI (Instance 1)
===============================================================================
VTEP Address Egress VNI Oper Mcast Num
State MACs
-------------------------------------------------------------------------------
192.0.2.3 1 Up BUM 0
192.0.2.4 1 Up BUM 0
-------------------------------------------------------------------------------
Number of Egress VTEP, VNI : 2
-------------------------------------------------------------------------------
---snip---
===============================================================================
The RNVE is unaware of AR-Rs; therefore, the list of AR-Rs is empty on PE-5:
[/]
A:admin@PE-5# show service id 10 vxlan assisted-replication replicator
===============================================================================
Vxlan AR Replicator Candidates
===============================================================================
Inst VTEP Address Egr VNI In Use In Candidate List Pending Time
-------------------------------------------------------------------------------
No Matching Entries
===============================================================================
Verification of multicast traffic
The multicast source connected to PE-3 generates multicast traffic. PE-3 acts as AR-L and forwards the multicast packets to AR-R PE-1. In this example topology, multicast traffic enters port 1/1/c3/1 on PE-3 and is forwarded to egress port 1/1/c1/1 toward PE-1. Port statistics are cleared and traffic is generated, then the port statistics are verified.
[/]
A:admin@PE-3# show port 1/1/c1/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c1/1 82 8878
48890 75662990
===============================================================================
[/]
A:admin@PE-3# show port 1/1/c2/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c2/1 67 7654
68 7912
===============================================================================
[/]
A:admin@PE-3# show port 1/1/c3/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c3/1 48809 73213500
0 0
===============================================================================
Besides the multicast traffic, IGP signaling is sent and received on the network interfaces. This explains why the counters on the network interface 1/1/c1/1 toward PE-1 show a slightly higher value than on the interface 1/1/c3/1 toward the multicast source. No multicast traffic is forwarded to PE-2, which is an AR-R candidate, but not used. AR-L PE-3 selected PE-1 for VPLS 10.
When the AR-R PE-1 receives the multicast traffic from PE-3, it forwards the traffic to PE-4 and PE-5 within the VXLAN service. The VXLAN information for VPLS 10 on PE-1 shows that PE-2 is not in the list of egress VTEPs. The reason is that PE-2 does not have any SAPs or SDP-bindings and no IMET IR route is sent by PE-2 because advertise-ingress-replication is disabled.
[/]
A:admin@PE-1# show service id 10 vxlan destinations
===============================================================================
Egress VTEP, VNI (Instance 1)
===============================================================================
VTEP Address Egress VNI Oper Mcast Num
State MACs
-------------------------------------------------------------------------------
192.0.2.3 1 Up BUM 0
192.0.2.4 1 Up BUM 0
192.0.2.5 1 Up BUM 0
-------------------------------------------------------------------------------
Number of Egress VTEP, VNI : 3
-------------------------------------------------------------------------------
---snip---
===============================================================================
AR-R PE-1 receives the multicast traffic from PE-3 on port 1/1/c2/1 and forwards it to the egress ports 1/1/c3/1 toward PE-5 and 1/1/c4/1 toward PE-4, as follows. No multicast traffic needs to be forwarded to egress port 1/1/c1/1 toward PE-2. Source squelching ensures that the traffic is not sent back to the originator AR-L PE-3. PE-1 has no local SAPs or SDP-bindings.
[/]
A:admin@PE-1# show port 1/1/c1/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c1/1 45 4959
45 5077
===============================================================================
[/]
A:admin@PE-1# show port 1/1/c2/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c2/1 48855 75659086
44 4823
===============================================================================
[/]
A:admin@PE-1# show port 1/1/c3/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c3/1 47 5055
48857 75659322
===============================================================================
[/]
A:admin@PE-1# show port 1/1/c4/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c4/1 47 5118
48855 75659050
===============================================================================
An egress AR-L or RNVE node performs regular egress BUM forwarding procedures. Packets are replicated to local SAPs or SDP-bindings, but not to VXLAN-bindings.
AR-R failure scenarios
When the AR-IP interface on the used AR-R is down for any kind of reason, the route to this AR-IP is removed from the routing table on AR-L PE-3, and PE-3 selects AR-R PE-2. To simulate an AR-R failure, the AR-IP interface on PE-1 is disabled, as follows:
On PE-1:
configure {
router "Base" {
interface "AR-IP" {
admin-state disable
After a while, the routing table on PE-3 does not contain an entry for prefix 1.1.1.1/32 anymore, as follows:
[/]
A:admin@PE-3# show router route-table 1.1.1.1/32
===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
No. of Routes: 0
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================
AR-R PE-1 is not eligible anymore when the AR-IP is not reachable. PE-2 is now selected as AR-R, so BM traffic is forwarded to PE-2. Log 99 on PE-3 shows the change in AR-R from PE-1 to PE-2, as follows:
On PE-3:
136 2023/07/12 11:34:57.482 CEST MINOR: SVCMGR #2090 Base
"Assisted replicator in service 10 changed to VTEP 2.2.2.2, Egress VNI 1 vxlan-instance 1."
The VXLAN destinations for VPLS 10 on PE-3 do not include VTEP 1.1.1.1 anymore, as follows:
[/]
A:admin@PE-3# show service id 10 vxlan destinations
===============================================================================
Egress VTEP, VNI (Instance 1)
===============================================================================
VTEP Address Egress VNI Oper Mcast Num
State MACs
-------------------------------------------------------------------------------
2.2.2.2 1 Up BM 0
192.0.2.4 1 Up U 0
192.0.2.5 1 Up U 0
-------------------------------------------------------------------------------
Number of Egress VTEP, VNI : 3
-------------------------------------------------------------------------------
---snip---
===============================================================================
Only PE-2 is listed as AR-R for VPLS 10 on PE-3, and PE-2 is the selected AR-R for VPLS 10, as follows:
[/]
A:admin@PE-3# show service id 10 vxlan assisted-replication replicator
===============================================================================
Vxlan AR Replicator Candidates
===============================================================================
Inst VTEP Address Egr VNI In Use In Candidate List Pending Time
-------------------------------------------------------------------------------
1 2.2.2.2 1 yes yes 0
-------------------------------------------------------------------------------
Number of entries : 1
-------------------------------------------------------------------------------
===============================================================================
Incoming multicast traffic on port 1/1/c3/1 on PE-3 is now forwarded to port 1/1/c2/1 toward PE-2, as follows:
[/]
A:admin@PE-3# show port 1/1/c1/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c1/1 61 6793
60 6694
===============================================================================
[/]
A:admin@PE-3# show port 1/1/c2/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c2/1 45 5497
48855 75660441
===============================================================================
[/]
A:admin@PE-3# show port 1/1/c3/1 statistics
===============================================================================
Port Statistics on Slot 1
===============================================================================
Port Ingress Packets Ingress Octets
Id Egress Packets Egress Octets
-------------------------------------------------------------------------------
1/1/c3/1 48810 73215000
0 0
===============================================================================
When the AR-IP interface on AR-R PE-2 is also disabled, no AR-R is available anymore and PE-3 reverts to IR instead.
On PE-2:
configure {
router "Base" {
interface "AR-IP" {
admin-state disable
The following log 99 message on AR-L PE-3 indicates that there is no AR-R anymore (VTEP 0.0.0.0, Egress VNI 0).
On PE-3:
2 2023/07/12 11:38:34.902 CEST MINOR: SVCMGR #2090 Base
"Assisted replicator in service 10 changed to VTEP 0.0.0.0, Egress VNI 0 vxlan-instance 1."
The list of VXLAN destinations for VPLS 10 on PE-3 does not include any AR-R (VTEP 1.1.1.1 or 2.2.2.2) anymore, as follows:
[/]
A:admin@PE-3# show service id 10 vxlan destinations
===============================================================================
Egress VTEP, VNI (Instance 1)
===============================================================================
VTEP Address Egress VNI Oper Mcast Num
State MACs
-------------------------------------------------------------------------------
192.0.2.4 1 Up BUM 0
192.0.2.5 1 Up BUM 0
-------------------------------------------------------------------------------
Number of Egress VTEP, VNI : 2
-------------------------------------------------------------------------------
---snip---
===============================================================================
[/]
A:admin@PE-3# show service id 10 vxlan assisted-replication replicator
===============================================================================
Vxlan AR Replicator Candidates
===============================================================================
Inst VTEP Address Egr VNI In Use In Candidate List Pending Time
-------------------------------------------------------------------------------
No Matching Entries
===============================================================================
In this case, IR is done for all BUM traffic toward PE-4 and PE-5.
Conclusion
AR uses replicators to forward broadcast and multicast traffic on behalf of less-performing nodes that are configured as AR-Ls. AR is primarily used for L2 multicast optimization in data centers, but may also be used in any network using overlay EVPN-VXLAN tunnels.