EVPN for VXLAN tunnels (Layer 2)
This chapter describes the components of EVPN-VXLAN Layer 2 on SR Linux.
EVPN-VXLAN L2 basic configuration
Basic configuration of EVPN-VXLAN L2 on SR Linux consists of the following:
A vxlan-interface, which contains the ingress VNI of the incoming VXLAN packets associated with the vxlan-interface
A MAC-VRF network-instance, where the vxlan-interface is attached. Only one vxlan-interface can be attached to a MAC-VRF network-instance.
BGP-EVPN is also enabled in the same MAC-VRF with a minimum configuration of the EVI and the network-instance vxlan-interface associated with it.
The BGP instance under BGP-EVPN has an encapsulation-type leaf, which is VXLAN by default.
For EVPN, this determines that the BGP encapsulation extended community is advertised with value VXLAN and the value encoded in the label fields of the advertised NLRIs is a VNI.
If the route-distinguisher or route-target/policies are not configured, the required values are automatically derived from the configured EVI as follows:
The route-distinguisher is derived as
<ip-address:evi>
, where theip-address
is the IPv4 address of the default network-instance sub-interface system0.0.The route-target is derived as
<asn:evi>
, where theasn
is the autonomous system configured in the default network-instance.
The following example shows a basic EVPN-VXLAN L2 configuration consisting of a vxlan-interface, MAC-VRF network-instance, and BGP-EVPN configuration:
--{ candidate shared default }--[ ]--
# info
...
tunnel-interface vxlan1 {
vxlan-interface 1 {
type bridged
ingress {
vni 10
}
egress {
source-ip use-system-ipv4-address
}
}
}
// In the network-instance:
A:dut2# network-instance blue
--{ candidate shared default }--[ network-instance blue ]--
# info
type mac-vrf
admin-state enable
description "Blue network instance"
interface ethernet-1/2.1 {
}
vxlan-interface vxlan1.1 {
}
protocols {
bgp-evpn {
bgp-instance 1 {
admin-state enable
vxlan-interface vxlan1.1
evi 10
}
}
bgp-vpn {
bgp-instance 1 {
// rd and rt are auto-derived from evi if this context is not configured
export-policy pol-def-1
import-policy pol-def-1
route-distinguisher {
route-distinguisher 64490:200
}
route-target {
export-rt target:64490:200
import-rt target:64490:100
}
}
}
}
EVPN L2 basic routes
EVPN Layer 2 (without multi-homing) includes the implementation of the BGP-EVPN address family and support for the following route types:
EVPN MAC/IP route (or type 2, RT2)
EVPN Inclusive Multicast Ethernet Tag route (IMET or type 3, RT3)
The MAC/IP route is used to convey the MAC and IP information of hosts connected to subinterfaces in the MAC-VRF. The IMET route is advertised as soon as bgp-evpn is enabled in the MAC-VRF; it has the following purpose:
Auto-discovery of the remote VTEPs attached to the same EVI
Creation of a default flooding list in the MAC-VRF so that BUM frames are replicated
Advertisement of the MAC/IP and IMET routes is configured on a per-MAC-VRF basis. The following example shows the default setting advertise true, which advertises MAC/IP and IMET routes.
Note that changing the setting of the advertise parameter and committing the change internally flaps the BGP instance.
--{ candidate shared default }--[ network-instance blue protocols bgp-evpn bgp-
instance 1 ]--
# info detail
admin-state enable
vxlan-interface vxlan1.1
evi 1
ecmp 1
default-admin-tag 0
routes {
next-hop use-system-ipv4-address
mac-ip {
advertise true
}
inclusive-mcast {
advertise true
}
}
Creation of VXLAN destinations based on received EVPN routes
The creation of VXLAN destinations of type unicast, unicast ES (Ethernet Segment), and multicast for each vxlan-interface is driven by the reception of EVPN routes.
The created unicast, unicast ES, and multicast VXLAN destinations are visible in state. Each destination is allocated a system-wide unique destination index and is an internal NHG-ID (next-hop group ID). The destination indexes for the VXLAN destinations are shown in the following example for destination 10.22.22.4, vni 1
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state tunnel-interface vxlan1 vxlan-interface 1 bridge-table unicast-
destinations destination * vni *
tunnel-interface vxlan1 {
vxlan-interface 1 {
bridge-table {
unicast-destinations {
destination 10.44.44.4 vni 1 {
destination-index 677716962904 // destination index
statistics {
}
mac-table {
mac 00:00:00:01:01:04 {
type evpn-static
last-update "16 hours ago"
}
}
}
}
}
}
}
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance blue bridge-table mac-table mac 00:00:00:01:01:04
network-instance blue {
bridge-table {
mac-table {
mac 00:00:00:01:01:04 {
destination-type vxlan
destination-index 677716962904 // destination index
type evpn-static
last-update "16 hours ago"
destination "vxlan-interface:vxlan1.1 vtep:10.44.44.4 vni:1"
}
}
}
}
The following is an example of dynamically created multicast destinations for a vxlan-interface:
--{ [FACTORY] + candidate shared default }--[ ]--
A:dut1# info from state tunnel-interface vxlan1 vxlan-interface 1 bridge-
table multicast-destinations
tunnel-interface vxlan1 {
vxlan-interface 1 {
bridge-table {
multicast-destinations {
destination 40.1.1.2 vni 1 {
multicast-forwarding BUM
destination-index 46428593833
}
destination 40.1.1.3 vni 1 {
multicast-forwarding BUM
destination-index 46428593835
}
destination 40.1.1.4 vni 1 {
multicast-forwarding BUM
destination-index 46428593829
}
}
}
}
}
EVPN route selection
When a MAC is received from multiple sources, the route is selected based on the priority listed in MAC selection. Learned and EVPN-learned routes have equal priority; the latest received route is selected.
When multiple EVPN-learned MAC/IP routes arrive for the same MAC but with a different key (for example, two routes for MAC M1 with different route-distinguishers), a selection is made based on the following priority:
EVPN MACs with higher SEQ number
EVPN MACs with lower IP next-hop
EVPN MACs with lower Ethernet Tag
EVPN MACs with lower RD
BGP next hop configuration for EVPN routes
You can configure the BGP next hop to be used for the EVPN routes advertised for a network-instance. This next hop is by default the IPv4 address configured in interface system 0.0
of the default network-instance. However, the next-hop address can be changed to any IPv4 address.
The system does not check that the configured IP address exists in the default network-instance. Any valid IP address can be used as next hop of the EVPN routes advertised for the network-instance, irrespective of its existence in any subinterface of the system. However, the receiver leaf nodes create their unicast, multicast and ES destinations to this advertised next-hop, so it is important that the configured next-hop is a valid IPv4 address that exists in the default network-instance.
When the system or loopback interface configured for the BGP next-hop is administratively disabled, EVPN still advertises the routes, as long as a valid IP address is available for the next-hop. However, received traffic on that interface is dropped.
The following example configures a BGP next hop to be used for the EVPN routes advertised for a network-instance.
--{ candidate shared default }--[ network-instance 1 protocols bgp-evpn bgp-
instance 1 ]--
# info
routes {
next-hop 1.1.1.1
}
}
MAC duplication detection for Layer 2 loop prevention in EVPN
MAC loop prevention in EVPN broadcast domains is based on the SR Linux MAC duplication feature (see MAC duplication detection and actions), but considers MACs that are learned via EVPN as well. The feature detects MAC duplication for MACs moving among bridge subinterfaces of the same MAC-VRF, as well as MACs moving between bridge subinterfaces and EVPN in the same MAC-VRF, but not for MACs moving from a VTEP to a different VTEP (via EVPN) in the same MAC-VRF.
Also, when a MAC is declared as duplicate, and the blackhole configuration option is added to the interface, then not only incoming frames on bridged subinterfaces are discarded if their MAC SA or DA match the blackhole MAC, but also frames encapsulated in VXLAN packets are discarded if their source MAC or destination MAC match the blackhole MAC in the mac-table.
When a MAC exceeds the allowed num-moves, the MAC is moved to a type duplicate (irrespective of the type of move: EVPN-to-local, local-to-local, local-to-EVPN), the EVPN application receives an update that advertises the MAC with a higher sequence number (which may trigger the duplication in other nodes). The ‟duplicate” MAC can be overwritten by a higher priority type, or flushed by the tools command (see Deleting entries from the bridge table).
EVPN L2 multi-homing
SR Linux supports single-active multi-homing and all-active multi-homing, as defined in RFC 7432. The EVPN multi-homing implementation uses the following SR Linux features:
System network-instance
A system network-instance container hosts the configuration and state of EVPN for multi-homing.
BGP network-instance
The ES model uses a BGP instance from where the RD/RT and export/import policies are taken to advertise and process the multi-homing ES routes. Only one BGP instance is allowed, and all the ESes are configured under this BGP instance. The RD/RTs cannot be configured when the BGP instance is associated with the system network-instance; however the operational RD/RTs are still shown in state.
Ethernet Segments
An ES has an admin-state (disabled by default) setting that must be toggled to change any of the parameters that affect the EVPN control plane. In particular, the ESes support the following:
General and per-ES boot and activation timers.
Manual 10-byte ESI configuration.
All-active and single-active multi-homing modes.
DF Election algorithm type Default (modulo based) or type Preference.
Configuration of ES and AD per-ES routes next-hop, and ES route originating-IP per ES.
An AD per ES route is advertised per mac-vrf, where the route carries the network-instance RD and RT.
Association with an interface that can be of type Ethernet or LAG. When associated with a LAG, the LAG can be static or LACP-based. In case of LACP, the same system-id/system-priority/port-key settings must be configured on all the nodes attached to the same ES.
Aliasing load balancing
This hashing operation for aliasing load balancing uses the following hash fields in the incoming frames by default:
For IP traffic: IP DA and IP SA, Layer 4 source and destination ports, protocol, VLAN ID.
For Ethernet (non-IP) traffic: MAC DA and MAC SA, VLAN ID, Ethertype.
For IPv6 addresses, 32 bit fields are generated by XORing and Folding the 128 bit address. The packet fields are supplied as input to the hashing computation.
Reload-delay timer
The reload-delay timer configures an interface to be shut down for a period of time following a node reboot or an IMM reset to avoid black-holing traffic.
EVPN L2 multi-homing procedures
EVPN relies on three different procedures to handle multi-homing: DF election, split-horizon, and aliasing. DF election is relevant to single-active and all-active multi-homing; split-horizon and aliasing are relevant only to all-active multi-homing.
DF Election – The Designated Forwarder (DF) is the leaf that forwards BUM traffic in the ES. Only one DF can exist per ES at a time, and it is elected based on the exchange of ES routes (type 4) and the subsequent DF Election Algorithm (DF Election Alg).
In single-active multi-homing, the non-DF leafs bring down the subinterface associated with the ES.
In all-active multi-homing, the non-DF leafs do not forward BUM traffic received from remote EVPN PEs.
Split-horizon – This is the mechanism by which BUM traffic received from a peer ES PE is filtered so that it is not looped back to the CE that first transmitted the frame. Local bias is applied in VXLAN services, as described in RFC 8365.
Aliasing – This is the procedure by which PEs that are not attached to the ES can process non-zero ESI MAC/IP routes and AD routes and create ES destinations to which per-flow ECMP can be applied.
To support multi-homing, EVPN-VXLAN supports two additional route types:
ES routes (type 4) – Used for ES discovery on all the leafs attached to the ES and DF Election.
ES routes use an ES-import route target extended community (its value is derived from the ESI), so that its distribution is limited to only the leafs that are attached to the ES.
The ES route is advertised with the DF Election extended community, which indicates the intent to use a specific DF Election Alg and capabilities.
Upon reception of the remote ES routes, each PE builds a DF candidate list based on the originator IP of the ES routes. Then, based on the agreed DF Election Alg, each PE elects one of the candidates as DF for each mac-vrf where the ES is defined.
AD route (type 1) – Advertised to the leafs attached to an ES. There are two versions of AD routes:
AD per-ES route – Used to advertise the multi-homing mode (single-active or all-active) and the ESI label, which is not advertised or processed in case of VXLAN. Its withdrawal enables the mass withdrawal procedures in the remote PEs.
AD per-EVI route – Used to advertise the availability of an ES in an EVI and its VNI. It is needed by the remote leafs for the aliasing procedures.
Both versions of AD routes can influence the DF Election. Their withdrawal from a leaf results in removing that leaf from consideration for DF Election for the associated EVI, as long as ac-df exclude is configured. (The AC-DF capability can be set to exclude only if the DF election algorithm type is set to preference.)
EVPN-VXLAN local bias for all-active multi-homing
Local bias for all-active multi-homing is based on the following behavior at the ingress and egress leafs:
At the ingress leaf, any BUM traffic received on an all-active multi-homing LAG subinterface (associated with an EVPN-VXLAN mac-vrf) is flooded to all local subinterfaces, irrespective of their DF or NDF status, and VXLAN tunnels.
At the egress leaf, any BUM traffic received on a VXLAN subinterface (associated with an EVPN-VXLAN mac-vrf) is flooded to single-homed subinterfaces and multi-homed subinterfaces whose ES is not shared with the owner of the source VTEP if the leaf is DF for the ES.
In SR Linux, the local bias filtering entries on the egress leaf are added or removed based on the ES routes, and they are not modified by the removal of AD per EVI/ES routes. This may cause blackholes in the multi-homed CE for BUM traffic if the local subinterfaces are administratively disabled.
Single-active multi-homing
EVPN L2 single-homing configuration shows a single-active ES attached to two leaf nodes. In this configuration, the ES in single-active mode can be configured to do the following:
Associate to an Ethernet interface or a LAG interface (as all-active ESes)
Coexist with all-active ESes on the same node, as well as in the same MAC-VRF service.
Signal the non-DF state to the CE by using LACP out-of-synch signaling or power off.
Optionally, the ES can be configured not to signal to the CE. When the LACP synch flag or power off is used to signal the non-DF state to the CE/server, all of the subinterfaces are active on the same node; that is, load balancing is not per-service, but rather per-port. This mode of operation is known as EVPN multi-homing port-active mode.
Connect to a CE that uses a single LAG to connect to the ES or separate LAG/ports per leaf in the ES.
All peers in the ES must be configured with the same multi-homing mode; if the nodes are not
configured consistently, the oper-multi-homing-mode
in state is
single-active. From a hardware resource perspective, no local-bias-table entries are
populated for ESes in single-active mode.
The following features work in conjunction with single-active mode:
Preference-based DF election / non-revertive option configures ES peers to elect a DF based on a preference value, as well as an option to prevent traffic from reverting back to a former DF node.
Attachment Circuit influenced DF Election (AC-DF) allows the DF election candidate list for a network-instance to be modified based on the presence of the AD per-EVI and per-ES routes.
Standby LACP-based or power-off signaling configures how the node’s non-DF state is signaled to the multi-homed CE.
Preference-based DF election / non-revertive option
Preference-based DF election is defined in draft-ietf-bess-evpn-pref-df and specifies a way for the ES peers to elect a DF based on a preference value (highest preference-value wins). The draft-ietf-bess-evpn-pref-df document also defines a non-revertive mode, so that upon recovery of a former DF node, traffic does not revert to the node. This is desirable in most cases to avoid double impact on the traffic (failure and recovery).
The configuration requires the command df-election/algorithm/type preference and the corresponding df-election/algorithm/preference-alg/preference-value. Optionally, you can set non-revertive mode to true. See EVPN multi-homing configuration example.
All of the peers in the ES should be configured with the same algorithm type. However, if that is not the case, all the peers fall back to the default algorithm/oper-type.
Attachment Circuit influenced DF Election (AC-DF)
AC-DF refers to the ability to modify the DF election candidate list for a network-instance based on the presence of the AD per-EVI and per-ES routes. When enabled (ac-df include command), a node cannot become DF if it has the ES subinterface in administratively disabled state. The AC-DF capability is defined in RFC 8584, and it is by default enabled.
The AC-DF capability should be disabled (ac-df exclude command) when single-active multi-homing is used and standby-signaling (lacp power-off command) signals the non-DF state to the multi-homed CE/server. In this case, the same node must be DF for all the contained subinterfaces. Administratively disabling one subinterface does not cause a DF switchover for the network-instance if ac-df exclude is configured.
The AC-DF capability is configured with the command df-election algorithm preference-alg capabilities ac-df; include is the default. See EVPN multi-homing configuration example.
Standby LACP-based or power-off signaling
Standby LACP-based or power-off signaling is used for cases where the AC-DF capability is excluded, and the DF election port-active mode is configured.
When single-active multi-homing is used and all subinterfaces on the node for the ES must be in DF or non-DF state, the multi-homed CE should not send traffic to the non-DF node. SR Linux supports two ways of signaling the non-DF state to the multi-homed CE: LACP standby or power-off.
Signaling the non-DF state is configured at the interface level, using the command interface ethernet standby-signaling, and must also be enabled for a specific ES using the ethernet-segment df-election interface-standby-signaling-on-non-df command. See EVPN multi-homing configuration example.
The LACP signaling method is only available on LAG interfaces with LACP enabled. When the node is in non-DF state, it uses an LACP out-of-synch notification (the synch bit is clear in the LACP PDUs) to signal the non-DF state to the CE. The CE then brings down LACP, and the system does not jump to the collecting-distributing state, and neither does the peer (because of out_of_sync). After the port is out of standby mode, LACP needs to be re-established, and the forwarding ports need to be programmed after that.
The power-off signaling is available on Ethernet and LAG interfaces. When the node is in non-DF
state, the interface goes oper-down, and the lasers on the Ethernet interfaces (all
members in case of a LAG) are turned off. This brings the CE interface down and avoids
any traffic on the link. The interfaces state show oper-state down
and
oper-down-reason standby-signaling
.
Reload-delay timer
After the system boots, the reload-delay timer keeps an interface shut down with the laser off for a configured amount of time until connectivity with the rest of network is established. When applied to an access multi-homed interface (typically an Ethernet Segment interface), this delay can prevent black-holing traffic coming from the multi-homed server or CE.
In EVPN multi-homing scenarios, if one leaf in the ES peer group is rebooting, coming up after an upgrade or a failure, it is important for the ES interface not to become active until after the node is ready to forward traffic to the core. If the ES interface comes up too quickly and the node has not programmed its forwarding tables yet, traffic from the server is black-holed. To prevent this from happening, you can configure a reload-delay timer on the ES interface so that the interface does not become active until after network connectivity is established.
When a reload-delay timer is configured, the interface port is shut down and the laser is turned off from the time that the system determines the interface state following a reboot or reload of the XDP process, until the number of seconds specified in the reload-delay timer elapse.
The reload-delay timer is only supported on Ethernet interfaces that are not enabled with breakout mode. For a multi-homed LAG interface, the reload-delay timer should be configured on all the interface members. The reload-delay timer can be from 1-86,400 seconds. There is no default value; if not configured for an interface, there is no reload-delay timer.
Only ES interfaces should be configured with a non-zero reload-delay timer. Single-homed interfaces and network interfaces (used to forward VXLAN traffic) should not have a reload-delay timer configured.
The following example sets the reload-delay timer for an interface to 20 seconds. The timer starts following a system reboot or when the IMM is reconnected, and the system determines the interface state. During the timer period, the interface is deactivated and the port laser is inactive.
--{ * candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
admin-state enable
ethernet {
reload-delay 20
}
}
When the reload-delay timer is running, the port-oper-down-reason
for the port
is shown as interface-reload-timer-active
. The
reload-delay-expires
state indicates the amount of time remaining
until the port becomes active. For example:
--{ * candidate shared default }--[ ]--
# info from state interface ethernet-1/1
interface ethernet-1/1 {
description eth_seg_1
admin-state enable
mtu 9232
loopback-mode false
ifindex 671742
oper-state down
oper-down-reason interface-reload-time-active
last-change "51 seconds ago"
vlan-tagging true
...
ethernet {
auto-negotiate false
lacp-port-priority 32768
port-speed 100G
hw-mac-address 00:01:01:FF:00:15
reload-delay 20
reload-delay-expires "18 seconds from now"
flow-control {
receive false
transmit false
}
}
}
EVPN multi-homing configuration example
The following is an example of a single-active multi-homing configuration, including standby signaling power-off, AC-DF capability, and preference-based DF algorithm.
The following configures power-off signaling and the reload delay timer for an interface:
--{ * candidate shared default }--[ ]--
# info interface ethernet-1/21 ethernet
standby-signaling power-off // needed to signal non-DF state to the CE
reload-delay 100 // upon reboot, this is required to avoid attracting traffic
from the multi-homed CE until the node is ready to forward.
// The time accounts for the time it takes all network protocols to be up
and forwarding entries ready.
The following configures DF election settings for the ES, including preference-based DF election and a preference value for the DF election alg. The ac-df setting is set to exclude, which disables the AC-DF capability. The non-revertive option is enabled, which prevents traffic from reverting back to a former DF node when the node reconnects to the network.
--{ * candidate shared default }--[ system network-instance protocols evpn ethernet-
segments bgp-instance 1 ethernet-segment eth_seg_1 ]--
# info
admin-state enable
esi 00:01:00:00:00:00:00:00:00:00
interface ethernet-1/21
multi-homing-mode single-active
df-election {
interface-standby-signaling-on-non-df { // presence container that enables
the standby-signaling for the ES
}
algorithm {
type preference // enables the use of preference based DF election
preference-alg {
preference-value 100 // changes the default 32767 to a
specific value
capabilities {
ac-df exclude // turns off the default ac-df capability
non-revertive true // enables the non-revertive mode
}
}
}
}
The following shows the state (and consequently the configuration) of an ES for single-active multi-homing and indicates the default settings for the algorithm/oper-type. All of the peers in the ES should be configured with the same algorithm type. However, if that is not the case, all the peers fall back to the default algorithm.
--{ * candidate shared default }--[ system network-instance protocols evpn ethernet-
segments bgp-instance 1 ethernet-segment eth_seg_1 ]--
# info
admin-state enable
oper-state up
esi 00:01:00:00:00:00:00:00:00:00
interface ethernet-1/21
multi-homing-mode single-active
oper-multi-homing-mode single-active // oper mode may be different if not all
the ES peers are configured in the same way
df-election {
interface-standby-signaling-on-non-df {
}
algorithm {
type preference
oper-type preference // if at least one peer in the ES is in type
default, all the peers will fall back to default
preference-alg {
preference-value 100
capabilities {
ac-df exclude
non-revertive true
}
}
}
}
routes {
next-hop use-system-ipv4-address
ethernet-segment {
originating-ip use-system-ipv4-address
}
}
association {
network-instance blue {
bgp-instance 1 {
designated-forwarder-role-last-change "2 seconds ago"
designated-forwarder-activation-start-time "2 seconds ago"
designated-forwarder-activation-time 3
computed-designated-forwarder-candidates {
designated-forwarder-candidate 40.1.1.1 {
add-time "2 seconds ago"
designated-forwarder true
}
designated-forwarder-candidate 40.1.1.2 {
add-time "2 minutes ago"
designated-forwarder false
}
}
}
}
}
}
To display information about the ES, use the show system network-instance ethernet-segments command. For example:
--{ [FACTORY] + candidate shared default }--[ ]--
# show system network-instance ethernet-segments eth_seg_1
------------------------------------------------------------------------------------
eth_seg_1 is up, single-active
ESI : 00:01:00:00:00:00:00:00:00:00
Alg : preference
Peers : 40.1.1.2
Interface: ethernet-1/21
Network-instances:
blue
Candidates : 40.1.1.1 (DF), 40.1.1.2
Interface : ethernet-1/21.1
------------------------------------------------------------------------------------
Summary
1 Ethernet Segments Up
0 Ethernet Segments Down
------------------------------------------------------------------------------------
The detail option displays more information about the ES. For example:
--{ [FACTORY] + candidate shared default }--[ ]--
# show system network-instance ethernet-segments eth_seg_1 detail
====================================================================================
Ethernet Segment
====================================================================================
Name : eth_seg_1
40.1.1.1 (DF)
Admin State : enable Oper State : up
ESI : 00:01:00:00:00:00:00:00:00:00
Multi-homing : single-active Oper Multi-homing : single-active
Interface : ethernet-1/21
ES Activation Timer : None
DF Election : preference Oper DF Election : preference
Last Change : 2021-04-06T08:49:44.017Z
====================================================================================
MAC-VRF Actv Timer Rem DF
eth_seg_1 0 Yes
------------------------------------------------------------------------------------
DF Candidates
------------------------------------------------------------------------------------
Network-instance ES Peers
blue 40.1.1.1 (DF)
blue 40.1.1.2
====================================================================================
On the DF node, the info from state command displays the following:
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state interface ethernet-1/21 | grep oper
oper-state up
oper-state not-present
oper-state up
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance blue interface ethernet-1/21.1
network-instance blue {
interface ethernet-1/21.1 {
oper-state up
oper-mac-learning up
index 6
multicast-forwarding BUM
}
}
On the non-DF node, the info from state command displays the following:
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state interface ethernet-1/21 | grep oper
oper-state down
oper-down-reason standby-signaling
oper-state not-present
oper-state down
oper-down-reason port-down
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance blue interface ethernet-1/21.1
network-instance blue {
interface ethernet-1/21.1 {
oper-state down
oper-down-reason subif-down
oper-mac-learning up
index 7
multicast-forwarding none
}
}
Layer 2 proxy-ARP
Proxy-ARP is a Layer 2 function supported on MAC-VRF network-instances that enables the learning of IP-to-MAC bindings so that leaf nodes can reply to ARP-requests without having to flood those requests in the BD. The proxy-ARP function supports ARP flooding suppression and security protection against ARP spoofing attacks in large BDs.
When proxy-ARP is enabled for a MAC-VRF, a table is created that contains entries related to proxy-ARP for the BD. Entries in the proxy-ARP table can be of the following types:
- Dynamic
Dynamic IP-MAC entries are learned by snooping ARP and ND messages; requires enabling proxy-ARP and enabling dynamic learning. See Dynamic learning for proxy-ARP.
- Static
Static IP-MAC entries are manually provisioned in the proxy-ARP table; requires enabling proxy-ARP and configuring static entries. See Static proxy-ARP entries.
- EVPN-learned
EVPN-learned IP-MAC entries are learned from information received in EVPN MAC/IP (type 2 or RT2) routes from remote PE devices; requires enabling proxy-ARP and configuring bgp-evpn on the MAC-VRF. See EVPN learning for proxy-ARP.
- Duplicate
Duplicate entries are identified as duplicates by the IP duplication detection procedure. You can configure the criteria that determines whether an entry is a duplicate and optionally inject anti-spoofing MACs in case of duplication. See Proxy-ARP duplicate IP detection.
The proxy-ARP table for the MAC-VRF has a default size of 250 entries. You can modify the table size. See Proxy-ARP table.
Layer 2 proxy-ARP illustrates how proxy-ARP functions in a BD.
In this example, the SR Linux leaf nodes snoop and learn the local hosts and add dynamic IP-MAC entries for them in the proxy-ARP table for the MAC-VRF. The IP-MAC bindings for the local hosts are advertised in EVPN MAC/IP (type 2 or RT2) routes and installed as EVPN-learned IP-MAC entries in the proxy-ARP table on the remote SR Linux leaf nodes.
For example, SRL Leaf-1 dynamically learns the IP-MAC binding for Host-1 (10.0.0.1-M1) and adds it to the proxy-ARP table as a dynamically learned entry. The IP-MAC binding for Host-1 is advertised in a type 2 route. The remote SRL Leaf-4 installs the IP-MAC binding for Host-1 in its proxy-ARP table as an EVPN-learned entry. When Host-5 sends an ARP-request for 10.0.0.1, SRL Leaf-4 looks it up in the proxy-ARP table and replies with M1. Because Leaf-4 has the 10.0.0.1-M1 binding in its proxy-ARP table, it does not need to flood the request in the BD.
If the lookup is unsuccessful, the ARP-request is re-injected into the datapath and flooded in the BD. Alternatively, this flooding can be suppressed. See Configuring proxy-ARP traffic flooding options.
See RFC 9161 for details about proxy-ARP in EVPN deployments.
Dynamic learning for proxy-ARP
When dynamic learning for proxy-ARP is enabled, all frames coming into bridged subinterfaces of the MAC-VRF that have Ethertype 0x0806 (ARP) are sent to the CPM for learning. This includes ARP-request and ARP-reply (including gratuitous ARP) messages. Dynamic entries in the proxy-ARP table are created from both message types. Learning an entry is done irrespective of the MAC Destination Address (DA) of the ARP packet (unicast or broadcast).
The dynamically learned information in the table is based on the ARP payload MAC Source Address (SA) and IP DA, and not the frame outer MAC SA (although they normally match). A valid MAC SA must be present for the entry to be learned.
In addition to learning the dynamic entry from the ARP-request or ARP-reply, the system re-injects and forwards messages as follows:
-
For received ARP-request messages, the system looks up the requested IP address, and if the lookup is successful, it sends an ARP-reply with the MAC→IP information. If the lookup is not successful, the ARP-request is re-injected and flooded based on the flood list and the configured flooding option, with source squelching. See Configuring proxy-ARP traffic flooding options.
The re-injected ARP-request keeps all existing non-service-delimiting tags of the original frame. Unicast ARP-requests are replied to if there is an entry in the proxy table. If the lookup is not successful, the frame is forwarded to the MAC DA.
-
For ARP-reply messages, the MAC DA (unicast) is looked up in the FDB. In case of a hit, the frame is re-injected and unicasted based on the FDB information. If there is no hit, the frame is re-injected and flooded based on the flood list information (with source squelching). The re-injected ARP-reply keeps all existing non-service-delimiting tags of the original frame.
- For ARP frames that are sent to the CPM, the ARP reply is always unicasted to the subinterface on which the ARP-request arrived, even if the MAC itself has not yet been learned in the MAC table.
Disabling dynamic learning for proxy-ARP causes the dynamically learned entries to be flushed from the proxy-ARP table. You can also set an age timer (default disabled) for dynamic entries, after which they are flushed from the proxy table.
In addition, you can set a timer to refresh dynamic entries. At a configured interval (default never) the system generates ARP-requests with the intent to refresh the proxy entry. An ARP-request is sent; if no response is received, another one is attempted.
Configuring dynamic learning for proxy-ARP
To configure dynamic learning for entries in the proxy-ARP table, enable proxy-ARP and dynamic learning for the MAC-VRF.
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
admin-state enable
dynamic-learning {
admin-state enable
age-time 600
send-refresh 200
}
}
}
}
This example also configures the age-time and send-refresh timers. By default, both timers are disabled.
- The age-time specifies in seconds the aging timer for each proxy-ARP entry. When the aging expires, the entry is flushed. The age is reset when a new ARP/GARP/NA for the same IP-MAC binding is received.
- The send-refresh timer sends ARP-request messages at the configured time, so that the owner of the IP address can reply and therefore refresh its IP-MAC (proxy-ARP) and MAC (FDB) entries.
Static proxy-ARP entries
You can configure static entries in the proxy-ARP table. Static entries have higher priority than snooped dynamic and EVPN entries in the table.
A static proxy-ARP entry requires a static or dynamic MAC entry in the MAC table to become active. The static entries are advertised in MAC/IP routes, with the MAC Mobility extended community following the information associated with the MAC entry (static bit or sequence number).
The system does not validate the MAC addresses configured in static entries.
When proxy-arp is disabled, the configured static entries remain in the proxy table, unlike dynamically learned and EVPN-learned entries, which are flushed from the table.
Configuring static proxy-arp entries
The following example enables proxy-ARP and configures a static entry in the proxy-ARP table. Note that the system does not validate MAC addresses specified in static entries.
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
admin-state enable
static-entries {
neighbor 101.1.1.1 {
link-layer-address 00:00:64:01:01:01
}
}
}
}
}
EVPN learning for proxy-ARP
When proxy-ARP is configured in an EVPN, the PE devices dynamically learn IP-MAC bindings for their local hosts and advertise them in type 2 routes to remote PE devices. The remote PE devices add these IP-MAC bindings to the proxy-ARP table as EVPN-learned entries.
When a host sends an ARP-request for a host on the remote side of the EVPN, if the PE device has the EVPN-learned IP-MAC binding in its proxy-ARP table, it sends back an ARP-reply with the remote host's MAC. If the IP-MAC binding does not exist in the PE device's proxy-ARP table, the ARP-request is flooded in the BD (ARP-request flooding can be optionally disabled). See Layer 2 proxy-ARP for an illustration of this process.
Configuring EVPN learning for proxy-ARP
To configure EVPN learning for proxy-ARP, enable proxy-ARP for the MAC-VRF and configure bgp-evpn to advertise the IP-MAC bindings in EVPN MAC/IP Advertisement routes and learn new IP-MAC bindings from the imported MAC/IP Advertisement routes.
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
admin-state enable
}
}
}
}
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bgp-evpn bgp-instance 1 routes bridge-table
network-instance MAC-VRF-1 {
protocols {
bgp-evpn {
bgp-instance 1 {
routes {
bridge-table {
mac-ip {
advertise true
}
}
}
}
}
}
}
Configuring proxy-ARP traffic flooding options
If a lookup in the proxy-ARP table is unsuccessful, by default the system re-injects the ARP-request into the data path and floods it in the BD. You can configure the following options for how ARP frames are flooded into the EVPN network. The default for both of these options is true.
unknown-arp-req
This option configures whether unknown broadcast ARP-requests are flooded to EVPN destinations. Unknown in this context means the lookup in the proxy-ARP table was unsuccessful. Non-broadcast ARP-requests are not affected by this option.
gratuitous-arp
This option configures whether Gratuitous ARP (GARP) requests or replies are flooded to EVPN destinations. GARPs are ARP messages where the sender's IP address matches the target's IP address. Normally the MAC DA is a broadcast address.
The following example disables flooding to EVPN destinations for both unknown broadcast ARP-requests and GARPs:
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp evpn flood
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
evpn {
flood {
unknown-arp-req false
gratuitous-arp false
}
}
}
}
}
Proxy-ARP duplicate IP detection
Proxy-ARP duplicate IP detection is a security mechanism described in RFC 9161 to detect ARP-spoofing attacks. In an ARP-spoofing attack, an attacker sends false ARP messages into a BD, with the goal of associating the attacker's IP address with a target host and directing traffic for that host to the attacker.
The proxy-ARP duplicate IP detection feature monitors changes to active entries in the proxy-ARP table. When an IP move occurs (for example, IP1→MAC1 is replaced by IP1→MAC2 in the table), a monitoring-window timer is started (default 3 minutes). If a specified number of IP moves (default 5) is detected before the monitoring-window timer expires, the IP is considered to be a duplicate.
When the system detects an IP move in the proxy table (for example, IP1→MAC1 changing to IP1→MAC2) it places the IP1→MAC2 proxy table entry in pending-confirmation state for a maximum of 30 seconds. During the pending-confirmation period, the ARP entry is inactive, and a confirm-message is unicast to MAC1. If no reply from MAC1 is received during the pending-confirmation period, the IP1→MAC2 entry is confirmed as legitimate and becomes active. If a reply from MAC1 is received, then MAC2 is sent a confirm-message. If MAC2 replies, an additional confirm-message is sent to MAC1. If both MAC1 and MAC2 keep replying to the confirm-messages, it triggers the duplicate IP detection procedure for IP1, because the number of IP moves exceeds the maximum allowed during the monitoring window.
When an IP is detected as a duplicate, the proxy table cannot be updated with new dynamic or EVPN-learned entries for the same IP (although you can configure a static entry for the IP). The duplicate IP is subject to this restriction until a hold-down timer expires (default 9 minutes), after which the entry for IP is removed from the proxy table, and the monitoring process for the IP is restarted.
Anti-Spoofing MAC
You can configure an Anti-Spoofing MAC (AS-MAC). If an AS-MAC is configured, the system associates the duplicate IP with the AS-MAC in the proxy-ARP table. A GARP/unsolicited-NA message with IP1→AS-MAC is sent to the local CEs, and a type-2 route with IP1→AS-MAC is sent to the remote PEs. This updates the ARP caches on the CEs in the BD, so that CEs in the BD use the AS-MAC as MAC DA when sending traffic to IP1.
If you configure the static-blackhole true
option, the AS-MAC is
installed in the MAC table as a blackhole MAC, which discards incoming frames with a
MAC source or destination matching the AS-MAC.
Configuring proxy-ARP duplicate IP detection
monitoring-window
This is the number of minutes the system monitors a proxy-ARP table entry following an IP move (default 3 minutes).
num-moves
This is the maximum number of IP moves a proxy-ARP table entry can have during the
monitoring-window
before the IP is considered duplicate (default 5 IP moves).hold-down-time
This is the number of minutes from the time an IP is declared duplicate to the time the IP is removed from the proxy-ARP table (default 9 minutes).
anti-spoof-mac
If configured, this replaces the MAC of the duplicate IP in the proxy-ARP table. The AS-MAC is advertised in EVPN to remote PEs.
static-blackhole
If this option is set to
true
, this installs the AS-MAC in the MAC table as a blackhole MAC, causing incoming frames with a MAC source or destination matching the AS-MAC to be discarded.
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp ip-duplication
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
ip-duplication {
monitoring-window 5
num-moves 7
hold-down-time 10
anti-spoof-mac 00:CA:FE:CA:FE:08
static-blackhole true
}
}
}
}
Displaying proxy-ARP duplicate IP detection information
Use the following show command to display information about duplicate IPs detected in the proxy-ARP table.
To display duplicate IP information from the proxy-ARP table for all MAC-VRFs:
--{ running }--[ ]--
# show network-instance * bridge-table proxy-arp ip-duplication duplicate-entries
---------------------------------------------------------------------------------------------
IP-duplication in network instance mac-vrf-1
---------------------------------------------------------------------------------------------
Monitoring window : 3 minutes
Number of moves allowed : 5
Hold-down-time : 10 seconds
Anti-Spoof-MAC : 00:DE:AD:00:00:01 (Static-blackhole)
---------------------------------------------------------------------------------------------
Duplicate entries in network instance mac-vrf-1
----------------------------------------------------------------------------------------------
+--------------+------------------------------+---------------------------+------------------+
| Neighbor | MAC Address | Detect Time | Hold down time |
| | | | remaining |
+==============+==============================+===========================+==================+
| 10.10.10.1 | 00:DE:AD:00:00:01 | 2021-12-11T12:48:24.000Z | 10 |
| 10.10.10.2 | 00:DE:AD:00:00:01 | 2021-12-11T12:48:24.000Z | 9 |
| 10.10.10.3 | 00:DE:AD:00:00:01 | 2021-12-11T12:48:24.000Z | 10 |
+--------------+------------------------------+---------------------------+------------------+
---------------------------------------------------------------------------------------------
IP-duplication in network instance mac-vrf-2
---------------------------------------------------------------------------------------------
...
---------------------------------------------------------------------------------------------
Total Duplicate IPs : 4 Total 4 Active
---------------------------------------------------------------------------------------------
Proxy-ARP table
When you enable proxy-ARP for a MAC-VRF, this creates a table containing proxy-ARP entries learned dynamically by snooping ARP messages, configured statically, or learned from type-2 routes from remote PE nodes. By default, this table can contain up to 250 entries. You can configure the size of this table to be from 1 to 8,192 entries.
The system generates a log event when the size of the table reaches 90% of the maximum size and when the table reaches 95% of the maximum size. When the table reaches 100% of its maximum size, entries for an IP can be replaced (that is, a different MAC can be learned and added for the IP), but no new IP entries can be added to the table, regardless of the type (dynamic, static, or EVPN-learned).
Configuring the proxy-ARP table size
By default, the proxy-ARP table can contain up to 250 entries of all types (dynamic, static, EVPN, duplicate). You can increase or decrease the maximum size of the table. If you configure the table size to be lower than the number of entries it currently contains, the system stops and restarts the proxy-ARP application, causing the non-static entries to be flushed from the table.
The following example configures the size of the proxy-ARP table for a MAC-VRF:
--{ candidate shared default }--[ ]--
# info network-instance MAC-VRF-1 bridge-table proxy-arp table-size
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
table-size 125
}
}
}
Clearing entries from the proxy-ARP table
Use the following tools commands to clear dynamic/duplicate entries from the proxy-ARP table. You can clear all dynamic/duplicate entries or you can clear specific entries.
To clear dynamic entries from the proxy-ARP table:
--{ running }--[ ]--
# tools network-instance MAC-VRF-1 bridge-table proxy-arp dynamic delete-all
# tools network-instance MAC-VRF-1 bridge-table proxy-arp dynamic entry 10.10.10.1 delete-ip
To clear duplicate entries from the proxy-ARP table:
--{ running }--[ ]--
# tools network-instance MAC-VRF-1 bridge-table proxy-arp duplicate delete-all
# tools network-instance MAC-VRF-1 bridge-table proxy-arp duplicate entry 10.10.10.1 delete-ip
Displaying proxy-ARP information
Use the following show commands to display information about the contents of the proxy-ARP table.
To display all entries in the proxy-ARP table for a MAC-VRF:
--{ candidate shared default }--[ ]--
# show network-instance MAC-VRF-1 bridge-table proxy-arp all
-----------------------------------------------------------------------------------------------
Proxy-ARP table of network instance MAC-VRF-1
-----------------------------------------------------------------------------------------------
+-------------+--------------------+------------+---------+--------+--------------------------+
| Neighbor | MAC Address | Type | State | Aging | Last Update |
+=============+====================+============+=========+========+==========================+
| 10.10.10.1 | 00:CA:FE:CA:FE:01 | dynamic | active | 300 | 2021-12-11T12:48:24.000Z |
| 10.10.10.2 | 00:CA:FE:CA:FE:02 | evpn | active | N/A | 2021-12-11T12:48:24.000Z |
| 10.10.10.3 | 00:CA:FE:CA:FE:03 | duplicate | active | N/A | 2021-12-11T12:48:24.000Z |
+-------------+--------------------+------------+---------+--------+--------------------------+
Total Static Neighbors : 0 Total 0 Active
Total Dynamic Neighbors : 1 Total 1 Active
Total Evpn Neighbors : 1 Total 1 Active
Total Duplicate Neighbors : 1 Total 1 Active
Total Neighbors : 3 Total 3 Active
-----------------------------------------------------------------------------------------------
The output of this command indicates the total number of entries of each type, as well as the number that are active (replies are sent for received ARP-requests). For an entry to be considered active in the proxy-ARP table, it must have a corresponding entry in the MAC table of the type listed in the following table:
Proxy-ARP table entry type | MAC table entry type |
---|---|
Dynamic |
Learned |
Dynamic |
Static |
Dynamic |
EVPN |
Static |
Learned |
Static |
Static |
EVPN |
EVPN (irrespective of the ESI) |
EVPN |
Static or dynamic matching the EVPN ESI |
Duplicate |
– |
If a proxy-ARP table entry has a corresponding entry in the MAC table of a type not listed in this table, then the proxy-ARP table entry is considered inactive, so no replies are sent for received ARP-requests. In addition, if the MAC-VRF is not active, its proxy-ARP table entries are not active as well.
To display information about a specific entry in the proxy-ARP table:
--{ candidate shared default }--[ ]--
# show network-instance MAC-VRF-1 bridge-table proxy-arp neighbor 10.10.10.1
----------------------------------------------------------------------------
Proxy-ARP table of network instance MAC-VRF-1
----------------------------------------------------------------------------
Neighbor : 10.10.10.1
MAC Address : 00:CA:FE:CA:FE:01
Type : dynamic
Programming Status : Success
Aging : 300
Last Update : 2021-12-11T12:48:24.000Z
Duplicate Detect time : N/A
Hold down time remaining: N/A
----------------------------------------------------------------------------
To display a summary of the contents of the proxy-ARP table for all MAC-VRFs:
--{ candidate shared default }--[ ]--
# show network-instance * bridge-table proxy-arp summary
------------------------------------------------------
Network Instance Proxy-ARP Table Summary
------------------------------------------------------
Network Instance: MAC-VRF-1
Total Static Neighbors : 0 Total 0 Active
Total Dynamic Neighbors : 1 Total 1 Active
Total Evpn Neighbors : 1 Total 1 Active
Total Duplicate Neighbors : 1 Total 1 Active
Total Neighbors : 3 Total 3 Active
Maximum Entries : 250
Warning Threshold: 95% (237)
Clear Warning : 90% (225)
-------------------------------------------------------
Network Instance: MAC-VRF-2
Total Static Neighbors : 1 Total 1 Active
Total Dynamic Neighbors : 1 Total 1 Active
Total Evpn Neighbors : 0 Total 0 Active
Total Duplicate Neighbors : 0 Total 0 Active
Total Neighbors : 2 Total 2 Active
Maximum Entries : 250
Warning Threshold: 95% (237)
Clear Warning : 90% (225)
--------------------------------------------------------
--------------------------------------------------------
Total Static Neighbors : 1 Total 1 Active
Total Dynamic Neighbors : 2 Total 2 Active
Total Evpn Neighbors : 1 Total 1 Active
Total Duplicate Neighbors : 1 Total 1 Active
Total Neighbors : 5 Total 5 Active
---------------------------------------------------------
Displaying proxy-ARP statistics
To display proxy-ARP statistics, use the info from state command in candidate or running mode, or the info command in state mode. You can display system-wide statistics or statistics for a specific MAC-VRF network-instance.
The following example displays proxy-ARP statistics for a MAC-VRF network-instance:
--{ candidate shared default }--[ ]--
# info from state network-instance MAC-VRF-1 bridge-table proxy-arp statistics
network-instance MAC-VRF-1 {
bridge-table {
proxy-arp {
statistics {
total-entries 0
active-entries 0
in-active-entries 0
pending-entries 0
neighbor-origin {
origin
total-entries 0
active-entries 0
in-active-entries 0
pending-entries 0
}
}
}
}