EVPN-VXLAN for layer 3
The EVPN Layer 3 configuration model builds on the model for EVPN routes described in EVPN-VXLAN for layer-2 and multi-homing. Understanding the concepts in the EVPN-VXLAN Layer-2 chapter is required to understand this chapter.
This chapter addresses connectivity between subnets across multiple Broadcast Domains (BDs) of the same tenant as defined in the EVPN Interface-less (IFL) model [draft-ietf-bess-evpn-prefix-advertisement]. It is based on the advertisement and processing of IP prefixes using EVPN type 5 routes. This chapter defines how CEs or servers can be multi-homed to multiple leaf nodes in an EVPN IFL network. It also describes other EVPN L3 topics such as:
IRB subinterface extensions for EVPN-VXLAN Layer 3
unicast routing PE-CE
layer 3 host mobility
Applicability
The information and configuration in this chapter are based on SR Linux Release 21.6.
Overview
EVPN IP-VRF domain in a multi-tenant DC shows four leaf routers attached to the same tenant or IP-VRF domain. Servers are connected to different subnets and, therefore, different BDs. The leaf routers provide inter-subnet forwarding by using the EVPN IFL model as defined in [draft-ietf-bess-evpn-prefix-advertisement]. The SR Linux implementation is fully standard and third-party routers, such as LEAF-4, can be connected to the same IP-VRF domain as SR Linux routers.
The procedures in this chapter define the configuration and operation for:
the EVPN IFL model for EVPN-VXLAN Layer 3 and the EVPN IP prefix routes
multi-homing in an EVPN-VXLAN Layer 3 solution
host route mobility aspects
PE-CE unicast routing on EVPN-VXLAN Layer 3 networks
EVPN-VXLAN Layer 3 feature parity for IPv6 prefixes
Configuration of EVPN-VXLAN IP-VRF domains
The following figure shows the configuration of an EVPN-VXLAN IP-VRF-10 distributed in three leaf routers, with different subnets, and multi-homing for SERVER-1.
Preconfiguring the underlay network
Before configuring the overlay BD, the underlay connectivity must be configured.
This chapter uses the same underlay configuration defined for EVPN-VXLAN Layer-2 and Multi-Homing. See Configuring the underlay network and review this section to understand the underlay configuration before proceeding.
Configuring the LEAF-3 IP-VRF domain
After LEAF-3 is pre-configured as defined in Configuring the underlay network, use the following steps to enable EVPN-VXLAN on LEAF-3.
As shown in Example of EVPN-VXLAN IP-VRF domain, LEAF-3 is attached to IP-VRF-10 and HOST-3 is connected to BD3. BD3 is mapped to subnet 103.1.1.0/24 and its IRB subinterface is the default-gateway to all hosts in BD3.
-
In candidate mode, create the interfaces and bridged subinterfaces that connect
LEAF-3 BD3 to HOST-3.
In this example:
-
Ethernet-1/2 connects HOST-3 to LEAF-3. Although this interface could be defined untagged, this example configures the interface as tagged (vlan-tagging true).
-
A subinterface with index 3 is created under the interface and must be configured as type bridged. Bridged subinterfaces can be associated with MAC-VRF instances as described in EVPN-VXLAN for layer-2 and multi-homing.
-
The subinterface uses vlan-id any, which captures any traffic that is not specified in other subinterfaces of the same interface.
Note: The IRB subinterface expects no vlan tags so that traffic forwarded from HOST-3 can be routed. If HOST-3 sends frames tagged with a vlan-id, the frames would be classified in the BD3 context, but the subinterface does not strip off the vlan tag, and frames are not routed.--{ [FACTORY] + candidate shared default }--[ interface ethernet-1/2 ]-- # info admin-state enable vlan-tagging true subinterface 3 { type bridged admin-state enable vlan { encap { single-tagged { vlan-id any } } } }
-
-
After creating the access subinterfaces, create the vxlan-interfaces.
This allows MAC-VRFs of the same BD to be connected throughout the IP fabric.In this example, no other leaf router is attached to BD3, so no vxlan-interface is needed in BD3. The configuration of the vxlan-interface is only shown for completeness. See EVPN-VXLAN for layer-2 and multi-homing for details on vxlan-interfaces and their characteristics.vxlan1 vxlan-interface 3 configuration
--{ [FACTORY] + candidate shared default }--[ tunnel-interface vxlan1 ]-- # info vxlan-interface 3 { type bridged ingress { vni 3 } }
-
IP-VRF instances in the leaf routers are also connected by VXLAN tunnels,
therefore, vxlan-interfaces of type routed must be created.
In this example, tunnel-interface vxlan2 vxlan-interface 10 is configured. While the configuration of the routed vxlan-interface is similar to the bridged vxlan-interface, this type ensures a routed vxlan-interface only attaches to an ip-vrf, and a bridged vxlan-interface only attaches to a mac-vrf.VXLAN tunnel configuration
--{ [FACTORY] + candidate shared default }--[ tunnel-interface vxlan2 ]-- # info vxlan-interface 10 { type routed ingress { vni 10 } }
-
Configure the IRB interface and subinterface that links BD3 and IP-VRF-10
together to allow packets from/to subnet 103.1.1.0/24 to route to/from remote
subnets in the local or remote leaf routers of the same tenant.
Note that because BD3 is not present in another leaf, the IRB subinterface is not configured as an anycast-gw subinterface. However, an operator may want to configure all IRB subinterfaces as anycast-gw in case the BD is extended later. See Configuring the IP-VRF Domain on LEAF-2 and LEAF-4 for anycast-gw configuration details.IRB configuration
--{ [FACTORY] + candidate shared default }--[ interface irb0 ]-- # info subinterface 3 { ipv4 { address 103.1.1.254/24 { } } }
-
Configure the network-instance type mac-vrf and associate it with the bridged
IRB interfaces and the vxlan-interface.
In the example that follows, BD3 is connected to HOST-3 and to the IRB subinterface that is also attached to IP-VRF-10.Although the B is not needed, in this example, the bgp-evpn and vxlan configuration is shown for completeness. For details about bgp-vpn, bgp-evpn, and vxlan-interface configuration, see EVPN-VXLAN for layer-2 and multi-homing.mac-vrf configuration and bridged interface association
--{ [FACTORY] + candidate shared default }--[ network-instance BD3 ]-- # info type mac-vrf interface ethernet-1/2.3 { } interface irb0.3 { } vxlan-interface vxlan1.3 { } protocols { bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.3 evi 3 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:3 import-rt target:64500:3 } } } }
-
Configure IP-VRF-10 with bgp-evpn enabled using the EVPN IFL model.
In this example, IP-VRF is configured with the irb0.3 interface for connectivity to BD3 and vxlan2.10. This allows the extension of IP-VRF-10 to remote leaf routers. A loopback interface is configured in the IP-VRF to test connectivity among IP-VRFs of the same tenant.When configured, all local routes learned on IP-VRF-10 route-table are advertised in IP Prefix routes (or IPv6 Prefix routes for local IPv6 routes).Enable EVPN IFL model on IP-VRF-10
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF* ]-- # info network-instance IP-VRF-10 { type ip-vrf interface irb0.3 { } interface lo10.2 { } vxlan-interface vxlan2.10 { } protocols { bgp-evpn { bgp-instance 1 { vxlan-interface vxlan2.10 evi 10 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:10 import-rt target:64500:10 } } } } }
The bgp-vpn instance is configured as described in Chapter EVPN-VXLAN for layer-2 and multi-homing (with a network-instance of type ip-vrf). Likewise, the bgp-evpn container enables EVPN in the ip-vrf and associates it to the routed vxlan-interface. The RD and RT can be auto-derived from the evi just like they can in mac-vrfs. Explicitly configured RD/RTs can override the auto-configured ones. Import and export policies are supported. -
Review the changes. If correct, commit the changes.
# commit stay --{ candidate shared default }--[ ]--
Configuring the IP-VRF Domain on LEAF-2 and LEAF-4
LEAF-2 and LEAF-4 are configured in the same way as LEAF-3, but with the addition of multi-homing, anycast-gw interfaces, and related configurations.
Use the following procedure to enable EVPN-VXLAN Layer 3 on LEAF-2 and LEAF-4. Considerations for configuring the IRB subinterfaces (Step 4) are provided in IRB subinterface considerations, if needed.
-
In candidate mode, create the interfaces, and bridged subinterfaces (including
LAG) to connect HOST-2, HOST-4, SERVER-1, and CE-3.
In this example, Ethernet Segment ES-1 is associated with lag1 and is multi-homed to SERVER-1 lag. LACP is enabled on lag1 (but can be disabled) and the admin-key, system-id-mac, and system-priority match on both LEAF-2 and LEAF-4.Interfaces and bridged subinterfaces configuration (LEAF-2)
// interfaces in Leaf-2 --{ [FACTORY] + candidate shared default }--[ interface * ]-- # info interface ethernet-1/11 { description ES-1 ethernet { aggregate-id lag1 } } interface ethernet-1/12 { description to-host-2 admin-state enable vlan-tagging true subinterface 24 { type bridged vlan { encap { single-tagged { vlan-id any } } } } } interface ethernet-1/22 { description to-CE-3 vlan-tagging true subinterface 2 { type bridged admin-state enable vlan { encap { single-tagged { vlan-id 2 } } } } } interface lag1 { admin-state enable vlan-tagging true subinterface 24 { type bridged vlan { encap { single-tagged { vlan-id 24 } } } } lag { lag-type lacp member-speed 100G lacp { interval FAST lacp-mode ACTIVE admin-key 24 system-id-mac 00:00:00:00:00:24 system-priority 24 } } }
Interfaces and bridged subinterfaces configuration (LEAF-4)// Interfaces in Leaf-4 --{ [FACTORY] + candidate shared default }--[ interface * ]-- # info interface ethernet-1/4 { description ES-1 ethernet { aggregate-id lag1 } } interface ethernet-1/13 { description to-host-4 admin-state enable vlan-tagging true subinterface 4 { type bridged admin-state enable vlan { encap { single-tagged { vlan-id any } } } } } interface lag1 { admin-state enable vlan-tagging true subinterface 24 { type bridged vlan { encap { single-tagged { vlan-id 24 } } } } lag { lag-type lacp member-speed 100G lacp { interval FAST lacp-mode ACTIVE admin-key 24 system-id-mac 00:00:00:00:00:24 system-priority 24 } } }
-
Create the all-active Ethernet Segment ES-1 that is attached to LEAF-2 and
LEAF-4.
Details about the ES configuration and operation can be found in EVPN-VXLAN for layer-2 and multi-homing. Note that the following example only shows the Ethernet Segment configuration. The bgp-vpn>bgp-instance 1 must also be configured as described in EVPN-VXLAN for layer-2 and multi-homing.ES configuration
// ES-1 on Leaf-2 --{ [FACTORY] + candidate shared default }--[ system network-instance protocols evpn ethernet-segments ]-- # info bgp-instance 1 { ethernet-segment ES-1 { admin-state enable esi 01:24:24:24:24:24:24:00:00:01 interface lag1 multi-homing-mode all-active } } // ES-1 on Leaf-4 --{ [FACTORY] + candidate shared default }--[ system network-instance protocols evpn ethernet-segments ]-- A:dut4# info bgp-instance 1 { ethernet-segment ES-1 { admin-state enable esi 01:24:24:24:24:24:24:00:00:01 interface lag1 multi-homing-mode all-active } }
-
After the access bridged subinterfaces are created, create the vxlan-interfaces
to facilitate connectivity between network-instances across the IP fabric.
The mac-vrf network-instances require both vxlan-interfaces of type bridged and ip-vrfs of type routed.In the following example, all the vxlan-interfaces for all network-instances on LEAF-2 and LEAF-4 nodes are configured as follows:vxlan-interface configuration
// vxlan-interfaces on Leaf-2 --{ [FACTORY] + candidate shared default }--[ tunnel-interface * ]-- # info tunnel-interface vxlan1 { vxlan-interface 2 { type bridged ingress { vni 2 } } vxlan-interface 24 { type bridged ingress { vni 24 } } } tunnel-interface vxlan2 { vxlan-interface 10 { type routed ingress { vni 10 } } } // vxlan-interfaces on Leaf-4 --{ [FACTORY] + candidate shared default }--[ tunnel-interface * ]-- # info tunnel-interface vxlan1 { vxlan-interface 4 { type bridged ingress { vni 4 } } vxlan-interface 24 { type bridged ingress { vni 24 } } } tunnel-interface vxlan2 { vxlan-interface 10 { type routed ingress { vni 10 } } }
-
Configure the IRB subinterfaces that link the mac-vrf and ip-vrf
network-instances for inter-subnet-forwarding.
See IRB subinterface considerations, for details on configuring IRB subinterfaces.IRB subinterface configuration
// IRB interfaces in Leaf-2 --{ [FACTORY] + candidate shared default }--[ interface irb* ]-- # info interface irb0 { subinterface 2 { ipv4 { address 20.1.1.2/24 { } arp { learn-unsolicited true ] } } anycast-gw { } } subinterface 24 { ipv4 { address 101.1.1.2/24 { } address 101.1.1.254/24 { anycast-gw true primary } address 102.1.1.254/24 { } arp { learn-unsolicited true debug [ messages ] host-route { populate dynamic { } } evpn { advertise dynamic { } } } } anycast-gw { } } } // IRB interfaces in Leaf-4 --{ [FACTORY] + candidate shared default }--[ interface irb* ]-- # info interface irb0 { subinterface 4 { ipv4 { address 104.1.1.4/24 { } address 104.1.1.254/24 { anycast-gw true primary } arp { learn-unsolicited true debug [ messages ] } } anycast-gw { } } subinterface 24 { ipv4 { address 101.1.1.4/24 { } address 101.1.1.254/24 { anycast-gw true primary } arp { learn-unsolicited true debug [ messages ] host-route { populate dynamic { } } evpn { advertise dynamic { } } } } anycast-gw { } }
-
Configure the network-instances of type mac-vrf and associate the bridged
subinterfaces, irb subinterfaces, and vxlan-interfaces. Then enable
bgp-evpn.
network instance configuration and association
// MAC-VRFs in Leaf-2 --{ [FACTORY] + candidate shared default }--[ network-instance BD* ]-- # info network-instance BD2 { type mac-vrf interface ethernet-1/22.2 { } interface irb0.2 { } vxlan-interface vxlan1.2 { } protocols { bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.2 evi 2 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:2 import-rt target:64500:2 } } } } } network-instance BD24 { type mac-vrf interface ethernet-1/12.24 { } interface irb0.24 { } interface lag1.24 { } vxlan-interface vxlan1.24 { } protocols { bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.24 evi 24 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:24 import-rt target:64500:24 } } } } } // MAC-VRFs in Leaf-4 --{ [FACTORY] + candidate shared default }--[ network-instance BD* ]-- # info network-instance BD24 { type mac-vrf interface irb0.24 { } interface lag1.24 { } vxlan-interface vxlan1.24 { } protocols { bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.24 evi 24 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:24 import-rt target:64500:24 } } } } } network-instance BD4 { type mac-vrf interface ethernet-1/13.4 { } interface irb0.4 { } vxlan-interface vxlan1.4 { } protocols { bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.4 evi 4 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:4 import-rt target:64500:4 } } } } }
-
Configure IP-VRF-10 with bgp-evpn enabled with the EVPN IFL model.
The IP-VRF is configured with the IRB subinterfaces previously created for BD2, BD4, and BD24. The IP-VRF instances are connected VXLAN tunnels, so routed vxlan-interfaces need to be associated with IP-VRF-10. A loopback interface is configured in the IP-VRF to test connectivity among IP-VRFs of the same tenant.In the following example, LEAF-2's IP-VRF is configured with a BGP PE-CE neighbor to CE-3.IP-VRF-10 configuration
// IP-VRF-10 in Leaf-2 --{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF* ]-- # info network-instance IP-VRF-10 { type ip-vrf interface irb0.2 { } interface irb0.24 { } interface lo10.2 { } vxlan-interface vxlan2.10 { } protocols { bgp-evpn { bgp-instance 1 { vxlan-interface vxlan2.10 evi 10 ecmp 2 } } bgp { admin-state enable autonomous-system 645002 router-id 2.2.2.2 group eBGP-PE-CE { admin-state enable export-policy export-all import-policy import-all ipv4-unicast { admin-state enable } } neighbor 20.1.1.3 { peer-as 645003 peer-group eBGP-PE-CE local-as 645002 { } transport { local-address 20.1.1.2 } } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:10 import-rt target:64500:10 } } } } } // IP-VRF-10 in Leaf-4 --{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF* ]-- # info network-instance IP-VRF-10 { type ip-vrf interface irb0.4 { } interface irb0.24 { } interface lo10.2 { } vxlan-interface vxlan2.10 { } protocols { bgp-evpn { bgp-instance 1 { vxlan-interface vxlan2.10 evi 10 ecmp 2 } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:64500:10 import-rt target:64500:10 } } } } }
-
Review the changes, and if correct, commit the changes.
A:dut2# commit stay --{ candidate shared default }--[ ]--
IRB subinterface considerations
The following are considerations for configuring IRB subinterfaces (performed in Step 4).
IRB subinterfaces on BDs that are distributed to multiple leaf nodes must be configured with at least one anycast-gw IP address and an anycast-gw MAC address. When the anycast-gw container is configured, the anycast-gw MAC address is auto-derived as 00:00:5E:00:01:01 in all the leaf nodes. The MAC address can also be explicitly configured if needed. In addition:
The same anycast-gw IP and MAC address must be configured on all IRBs of the same BD.
All the hosts attached to the same BD use the same default-gateway (that is, the anycast-gw IP) irrespective of the leaf they are connected to. For example, irb0 subinterfaces 24 are configured with the same anycast-gw IP on LEAF-2 and LEAF-4 (101.1.1.254) in the configuration example.
Default gateway redundancy in DCs is realized using anycast-gws, and not VRRP. Anycast-gws avoid upstream tromboning for hosts that are multi-homed to multiple leaf nodes or for single-homed hosts that move to other leaf nodes of the same BD.
IRB subinterfaces on BDs that are distributed may also be configured with non-anycast-gw IP addresses. This is only done when separate IPs are needed to check connectivity per leaf. For example, when LEAF-2 is configured with non-anycast-gw IPs 101.1.1.2 and 102.1.1.254, and LEAF-4 is configured with 104.1.1.4. and anycast-gw IPs exist in multiple nodes, the anycast-gw IPs should not be used in ICMP tools to check the availability of a leaf. Non-anycast-gw IPs should be used instead.
IRB subinterfaces on distributed BDs should be configured with the following commands, as shown for subinterface 24 in LEAF-2 and LEAF-4 in the configuration example:
arp learn-unsolicited true - Triggers the learning of ARP entries out of any ARP packet arriving at the IRB subinterface, regardless of whether there was an ARP-Request issued from the IRB.
arp host-route populate dynamic - Creates host routes (arp-nd) in the IP-VRF route-table from learned dynamic ARP entries in the IRB. The arp-nd host routes are then advertised to the remote leaf nodes. This assists the routing of downstream traffic to a specified host without hair-pinning traffic via another leaf connected to the same BD of the host, but not connected to the host directly.
arp evpn advertise dynamic - Advertises EVPN MAC/IP routes that include the MAC and the IP of the dynamic ARP entries. The advertisement of these routes synchronizes the ARP caches in all the IRB subinterfaces of the same BD.
IRB subinterfaces on BDs that are not distributed (that is, BDs attached to only one Leaf node) do not need to be configured with the following:
arp host-route-populate dynamic as downstream routing is always direct to the connected leaf
arp evpn advertise dynamic as ARP entries do not need to be synchronized with any other node
Examples of non-distributed BDs are BD2, BD4, and BD3 as shown in Example of EVPN-VXLAN IP-VRF domain. Their corresponding IRB subinterfaces do not create host-routes or advertise EVPN MAC/IP routes for the ARP entries.
Configuring EVPN IFL interoperability to EVPN IFF unnumbered model
While EVPN IFL for VXLAN is supported by most DC vendors, Nuage WBX or VSC/VRS use the EVPN IFF Unnumbered model. By default, the SR Linux EVPN IFL (interface-less) model does not inter-operate with the EVPN IFF (interface-full) model. However, it is possible to configure the SR Linux EVPN IFL model to inter-operate with the EVPN IFF model.
For more information about EVPN IFL vs EVPN IFF models, see the SR Linux EVPN-VXLAN User Guide and draft-ietf-bess-evpn-prefix-advertisement.
To configure inter-operability in IP-VRF-10, configure the advertise-gateway-mac command as shown in the following example.
EVPN IFF inter-operability in IP-VRF-10 configuration
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# protocols bgp-evpn bgp-instance 1 routes route-table mac-ip advertise-gateway-mac <value> ?
usage: advertise-gateway-mac <true|false>
If set to true in an ip-vrf where bgp-evpn is enabled, a MAC/IP route
containing the gateway-MAC is advertised.
This gateway-MAC matches the MAC advertised along with the EVPN IFL routes
type 5 for the ip-vrf network-instance. This advertisement is needed so that
the EVPN IFL (Interface-Less) model in the ip-vrf can interoperate with a remote
system working in EVPN IFF (Interface-ful) Unnumbered mode.
Positional arguments:
value [true|false, default false]
When set to true, the node advertises a MAC/IP route using all of the following:
-
Gateway-mac for IP-VRF-10 (that is, the system-mac)
-
RD/RT, next-hop, and VNI of IP-VRF-10
-
Null IP address, ESI or Ethernet Tag ID
MAC/IP route advertisement
In the following example, the MAC/IP route advertised is from LEAF-3. The MAC address matches the system-mac advertised in any local RT5s.
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# protocols bgp-evpn bgp-instance 1 routes route-table mac-ip advertise-gateway-mac true
--{ [FACTORY] +* candidate shared default }--[ network-instance IP-VRF-10 ]--
# commit stay
All changes have been committed. Starting new transaction.
2021-04-15T01:18:24.688185-07:00 dut3 local6|DEBU sr_bgp_mgr: bgp|4942|5135|1393922|D:
VR default (1) Peer 1: 1.1.1.1 UPDATE: Peer 1: 1.1.1.1 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 81
Flag: 0x90 Type: 14 Len: 44 Multiprotocol Reachable NLRI:
Address Family EVPNandidate shared default
root (8) Thu 01:18AM
NextHop len 4 NextHop 3.3.3.3
Type: EVPN-MAC Len: 33 RD: 3.3.3.3:10 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:01:03:ff:00:00, IP len: 0, IP: NULL, label1: 10
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:10
bgp-tunnel-encap:VXLAN
For IPv6, Nuage WBX devices support two EVPN L3 IPv6 modes: IFF unnumbered and IFF numbered. The SR Linux interoperability mode enabled by the advertise-gateway-mac command only works with devices that use EVPN IFF unnumbered. This is because EVPN IFL and EVPN IFF unnumbered models both use the same format in the IP prefix route, and differ only in the additional MAC/IP route for the gateway-mac. EVPN IFL and EVPN IFF numbered models have different IP prefix route formats, and cannot inter-operate.
Checking the EVPN IFL model in IP-VRFs
When configured, the state of the IP-VRF-10 on the three leaf nodes and basic connectivity should be checked.
When the leaf nodes attached to IP-VRF-10 exchange at least one EVPN IP-Prefix route on all leaf nodes of the tenant, the bgp_mgr requests the fib_mgr to create a VXLAN tunnel to each next-hop of the received EVPN routes type 5 (RT5s). This assumes the tunnel had not already been created.
When a VXLAN tunnel to the remote VTEP exists, the bgp_mgr requests the next-hop resolution to the fib_mgr, and if it resolves, the RT5 is installed in the IP-VRF route-table. Using LEAF-3 as a reference, you can check that RT5s are received from the two remote leaf nodes, and then verify that VXLAN tunnels exist to their VTEPs and the RT5s are installed in the route-table. Loopbacks are configured on each IP-VRF-10 instance to verify reachability.
Checking IP-VRF-10 state and connectivity
The following can check RT5s for the loopbacks 22.22.22.22 and 44.44.44.44 advertised by the remote leaf nodes. You can check that the routes contain the expected IP-VRF-10 VNI, route-target, and the mac-nh which is used as the inner destination MAC when sending VXLAN packets to the prefix.
Check IP-VRF-10 state and connectivity
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance default protocols bgp routes evpn route-type 5 prefix 22.22.22.22/32 detail
--------------------------------------------------------------------------------------------------
Show report for the EVPN routes in network-instance "default"
-------------------------------------------------------------------------------------------------
Route Distinguisher: 2.2.2.2:10
Tag-ID : 0
ip-prefix-len : 32
ip-prefix : 22.22.22.22/32
neighbor : 2.2.2.2
Gateway IP : 0.0.0.0
Received paths : 1
Path 1: <Best,Valid,Used,>
ESI : 00:00:00:00:00:00:00:00:00:00
VNI : 10
Route source : neighbor 2.2.2.2 (last modified 46m16s ago)
Route preference: No MED, LocalPref is 100
Atomic Aggr : false
BGP next-hop : 2.2.2.2
AS Path : i
Communities : [target:64500:10, mac-nh:00:01:02:ff:00:00, bgp-tunnel-encap:VXLAN]
RR Attributes : No Originator-ID, Cluster-List is []
Aggregation : None
Unknown Attr : None
Invalid Reason : None
Tie Break Reason: none
---------------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance default protocols bgp routes evpn route-type 5 prefix 44.44.44.44/32 summary
---------------------------------------------------------------------------------------------------
Show report for the BGP route table of network-instance "default"
--------------------------------------------------------------------------------------------------
Status codes: u=used, *=valid, >=best, x=stale
Origin codes: i=IGP, e=EGP, ?=incomplete
-------------------------------------------------------------------------------------------------
BGP Router ID: 3.3.3.3 AS: 3333 Local AS: 3333
------------------------------------------------------------------------------------------------
Type 5 IP Prefix Routes
+--------+---------------+-----+----------------+----------+----------+-----+---------+
| Status | Route- | Tag | IP-address | neighbor | Next-Hop | VNI | Gateway |
| | distinguisher | -ID | | | | | |
+========+===============+=====+================+==========+==========+=====+=========+
| u*> | 4.4.4.4:10 | 0 | 44.44.44.44/32 | 4.4.4.4 | 4.4.4.4 | 10 | 0.0.0.0 |
+--------+---------------+-----+----------------+----------+----------+-----+---------+
---------------------------------------------------------------------------------------
1 IP Prefix routes 1 used, 1 valid
---------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Checking for VXLAN tunnel creation
When the routes are correct, the VXLAN tunnels are created.
Check for VXLAN tunnel creation
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance default tunnel-table all
-----------------------------------------------------------------------------------------------------
Show report for network instance "default" tunnel table
------------------------------------------------------------------------------------------------------
+-------------+-----------+-------+-------+--------+------------+---------+--------------------------+
| IPv4 Prefix | Owner | Type | Index | Metric | Preference |Fib-prog | Last Update |
+=============+===========+=======+=======+========+============+=========+==========================+
| 2.2.2.2/32 | vxlan_mgr | vxlan | 7 | 0 | 0 | Y | 2021-04-13T10:43:09.483Z |
| 4.4.4.4/32 | vxlan_mgr | vxlan | 6 | 0 | 0 | Y | 2021-04-13T10:43:09.144Z |
+-------------+-----------+-------+-------+--------+------------+---------+--------------------------+
------------------------------------------------------------------------------------------------------
2 VXLAN tunnels, 2 active, 0 inactive
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Show report for network instance "default" tunnel table
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Checking for remote VTEPS and associated destinations
The following commands show the remote VTEPs and the associated destinations. A destination is the combination of the VTEP and VNI that is created when the EVPN routes are received and the VXLAN tunnel is created. IP destinations are created from RT5s.
Check for remote VTEPs and associated destinations
# show tunnel vxlan-tunnel vtep 2.2.2.2
----------------------------------------------------------------------------
Show report for vxlan-tunnels vtep
---------------------------------------------------------------------------
VTEP Address: 2.2.2.2
Index : 202418627561
Last Change : 2021-04-13T11:47:09.000Z
---------------------------------------------------------------------------
Destinations
---------------------------------------------------------------------------
+------------------+-----------------+------------+-----------------------+
| Tunnel Interface | VXLAN Interface | Egress VNI | Type |
+==================+=================+============+=======================+
| vxlan1 | 1 | 1 | multicast-destination |
| vxlan2 | 10 | 10 | ip-destination |
+------------------+-----------------+------------+-----------------------+
---------------------------------------------------------------------------
1 bridged destinations, 1 multicast, 0 unicast, 0 es
1 routed destinations
--------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show tunnel vxlan-tunnel vtep 4.4.4.4
---------------------------------------------------------------------------
Show report for vxlan-tunnels vtep
---------------------------------------------------------------------------
VTEP Address: 4.4.4.4
Index : 202418627553
Last Change : 2021-04-14T15:40:23.000Z
--------------------------------------------------------------------------
Destinations
---------------------------------------------------------------------------
+------------------+-----------------+------------+-----------------------+
| Tunnel Interface | VXLAN Interface | Egress VNI | Type |
+==================+=================+============+=======================+
| vxlan1 | 1 | 1 | multicast-destination |
| vxlan1 | 1 | 1 | unicast-destination |
| vxlan2 | 10 | 10 | ip-destination |
+------------------+-----------------+------------+-----------------------+
---------------------------------------------------------------------------
2 bridged destinations, 1 multicast, 1 unicast, 0 es
1 routed destinations
--------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Checking IP-VRF-10 route table
The following command checks the IP-VRF-10 route table to ensure all the remote subnets and hosts are received and installed. All interface and local routes are automatically advertised in RT5s. Because ECMP=2 is configured in the IP-VRF-10, there are two ECMP paths for the 101.1.1.0/24 subnet, which is attached to both LEAF-2 and LEAF-4.
Check IP-VRF-10 route table
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show route-table ipv4-unicast summary
------------------------------------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
------------------------------------------------------------------------------------------------
+-----------------+-----+--------+------------+--------+------+--------------------+-----------+
| Prefix | ID | Active | Route Type | Metric | Pref | Next-hop (Type) | Next-hop |
| | | | | | | | Interface |
+=================+=====+========+============+========+======+====================+===========+
| 31.31.31.31/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 20.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 20.1.1.3/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 22.22.22.22/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 33.33.33.33/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 44.44.44.44/32 | 0 | true | bgp-evpn | 0 | 170 | 4.4.4.4 (indirect) | None |
| 101.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| | | | | | | 4.4.4.4 (indirect) | None |
| 101.1.1.1/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 102.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 103.1.1.0/24 | 0 | true | local | 0 | 0 | 103.1.1.3 (direct) | irb0.3 |
| 103.1.1.1/32 | 0 | true | arp-nd | 0 | 1 | 103.1.1.1 (direct) | irb0.3 |
| 103.1.1.3/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 103.1.1.254/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 103.1.1.255/32 | 0 | true | host | 0 | 0 | None (broadcast) | None |
| 104.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 4.4.4.4 (indirect) | None |
+-----------------+-----+--------+------------+--------+------+--------------------+-----------+
------------------------------------------------------------------------------------------------
15 IPv4 routes total
15 IPv4 prefixes with active routes
1 IPv4 prefixes with active ECMP routes
-----------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
Checking route-table state for a RT5
Use the following command to check the route-table state for a RT5. This can be useful to understand how the RT5 is resolved to a vxlan tunnel and what vxlan VNI, inner source, and destination MACs are be used when sending VXLAN packets to that route. The following uses ECMP route 101.1.1.0/24 on LEAF-3. The route's next-hop group has two separate next-hops pointing at the remote LEAF-2 and LEAF-4 VTEPs:
Check route-table state for a RT5
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# info from state route-table ipv4-unicast route 101.1.1.0/24 id 0
route-table {
ipv4-unicast {
route 101.1.1.0/24 id 0 {
route-type bgp-evpn
route-owner bgp_evpn_mgr
metric 0
preference 170
active true
last-app-update "a day ago"
next-hop-group 202418627581
resilient-hash false
fib-programming {
status success
}
}
}
}
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# info from state route-table next-hop-group 202418627581 next-hop *
route-table {
next-hop-group 202418627581 {
next-hop 0 {
next-hop 202418627569
resolved true
}
next-hop 1 {
next-hop 202418627565
resolved true
}
}
}
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# info from state route-table next-hop 202418627569
route-table {
next-hop 202418627569 {
type indirect
ip-address 2.2.2.2
resolving-tunnel {
ip-prefix 2.2.2.2/32
tunnel-type vxlan
tunnel-owner vxlan_mgr
}
vxlan {
vni 10
source-mac 00:01:03:FF:00:00
destination-mac 00:01:02:FF:00:00
}
}
}
Monitoring pings
The following command monitors pings between the local LEAF-3 loopback and LEAF-2's loopback (22.22.22.22), the inner source, and destination MACs that are associated to the RT5's next-hop that are used in the actual packets. Note that the source-mac is the chassis MAC advertised in the mac-nh of the local RT5s:
Monitor pings
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
// a ping is sent from Leaf-3 to 22.22.22.22 (the loopback on Leaf-2’s IP-VRF-10)
# network-instance IP-VRF-10
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# ping 22.22.22.22
Using network instance IP-VRF-10
PING 22.22.22.22 (22.22.22.22) 56(84) bytes of data.
64 bytes from 22.22.22.22: icmp_seq=1 ttl=64 time=4.88 ms
64 bytes from 22.22.22.22: icmp_seq=2 ttl=64 time=4.76 ms
^C
--- 22.22.22.22 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 4.767/4.827/4.888/0.092 ms
// the monitor command catches the inner packet sent on VXLAN
--{ [FACTORY] +! candidate shared default }--[ network-instance IP-VRF-10 ]--
# /tools system traffic-monitor destination-address 33.33.33.33 protocol icmp
Running as user "root" and group "root". This could be dangerous.
Capturing on 'monit'
1 0.000000000 ethernet-1/1 00:01:02:ff:00:00 00:01:03:ff:00:00
22.22.22.22 33.33.33.33 ICMP 146 Echo (ping) reply id=0x580c,
seq=1/256, ttl=64
// the source MAC is the local chassis mac:
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state platform chassis mac-address
platform {
chassis {
mac-address 00:01:03:FF:00:00
}
}
// that source MAC is also advertised in the RT5’s mac-nh
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance default bgp-rib evpn rib-in-out rib-out-post
ip-prefix-routes 3.3.3.3:10 ethernet-tag-id 0 ip-prefix-length 32 ip-prefix 33.33.33.33/32
neighbor 2.2.2.2
network-instance default {
bgp-rib {
evpn {
rib-in-out {
rib-out-post {
ip-prefix-routes 3.3.3.3:10 ethernet-tag-id 0 ip-prefix-length 32
ip-prefix 33.33.33.33/32 neighbor 2.2.2.2 {
esi 00:00:00:00:00:00:00:00:00:00
gateway-ip 0.0.0.0
vni 10
attr-id 132
next-hop 3.3.3.3
}
}
}
}
}
}
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance default bgp-rib attr-sets attr-set rib-out index 132
network-instance default {
bgp-rib {
attr-sets {
attr-set rib-out index 132 {
origin igp
atomic-aggregate false
next-hop 3.3.3.3
med 0
local-pref 100
aggregator {
}
pmsi-tunnel {
}
communities {
ext-community [
target:64500:10
mac-nh:00:01:03:ff:00:00
bgp-tunnel-encap:VXLAN
]
}
unknown-attributes {
}
}
}
}
}
r
}
vxlan {
vni 10
source-mac 00:01:03:FF:00:00
destination-mac 00:01:02:FF:00:00
}
}
}
The received EVPN IFL IP Prefix routes are only installed in the IP-VRF-10 route-table if:
-
the import route-target matches the RT5 route-target
-
the RT5's VNI matches the VNI of the vxlan-interface in the IP-VRF-10
-
the RT5's gateway-ip is zero
Additional guidelines:
-
Importing an RT5 into multiple ip-vrf network-instances is not supported due to the VNI restriction: an ip-vrf can only use a single VNI for ingress and egress VXLAN packets. This is a TD3 limitation.
-
The route next-hop cannot be changed in ip-vrf network-instances. It is always the system-ip in this release.
-
The ip-vrf bgp-evpn bgp instance can be oper-down for the same reasons as the bgp-evpn bgp instance can be down in mac-vrfs. See EVPN-VXLAN for layer-2 and multi-homing for details.
-
VXLAN statistics are also accounted for when EVPN-IFL is used.
-
No MTU checks are done for VXLAN in EVPN-IFL. If the routed packet plus the VXLAN overhead exceeds the underlay interface MTU of the egress interface in the default network-instance, the packet is still encapsulated and sent to the remote leaf. No statistics increment or drops occur.
-
Outer TTL for VXLAN packets is always initialized to 255 and not copied or propagated from or to the inner IP packet.
Checking PE-CE routing on an IP-VRF with EVPN-IFL
In an EVPN-VXLAN Layer 3 network, PE-CE routing refers to the unicast routing between a CE connected to a BD in a leaf node and the IRB subinterface of the IP-VRF connected to the same BD. Static or BGP routing is supported in SR Linux. BFD can also be used between the IRB and the CE.
Checking PC-CE routing on IP-VRF
Example of EVPN-VXLAN IP-VRF domain depicts a PE-CE BGP session between CE-3 and IP-VRF-10 in LEAF-2. This configuration is needed in IP-VRF-10 to enable a PE-CE BGP session to CE-3.
Check PE-CE routing on IP-VRF
// IP-VRF-10 in Leaf-2
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF* ]--
# info
network-instance IP-VRF-10 {
type ip-vrf
interface irb0.2 {
}
interface irb0.24 {
}
interface lo10.2 {
}
vxlan-interface vxlan2.10 {
}
protocols {
bgp-evpn {
bgp-instance 1 {
vxlan-interface vxlan2.10
evi 10
ecmp 2
}
}
bgp {
admin-state enable
autonomous-system 645002
router-id 2.2.2.2
group eBGP-PE-CE {
admin-state enable
export-policy export-all
import-policy import-all
ipv4-unicast {
admin-state enable
}
}
neighbor 20.1.1.3 {
peer-as 645003
peer-group eBGP-PE-CE
local-as 645002 {
}
transport {
local-address 20.1.1.2
}
}
}
bgp-vpn {
bgp-instance 1 {
route-target {
export-rt target:64500:10
import-rt target:64500:10
}
}
}
}
}
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show protocols bgp neighbor 20.1.1.3
---------------------------------------------------------------------------------------------------
BGP neighbor summary for network-instance "IP-VRF-10"
Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
+-----------+----------+------------+--------+---------+------------+---------+---------+------------+
| Net-Inst | Peer | Group | Flags | Peer-AS | State | Uptime | AFI/ | [Rx/Active |
| | | | | | | | SAFI | /Tx] |
+===========+============+==========+========+=========+============+=========+======================+
| IP-VRF-10 | 20.1.1.3 | eBGP-PE-CE | S | 645003 | established| 1d:21h: | ipv4- | [2/1/11] |
| | | | | | | 46m:56s | unicast | |
+------------+----------+-----------+--------+---------+------------+---------+---------+------------+
-----------------------------------------------------------------------------------------------------
1 configured neighbors, 1 configured sessions are established,0 disabled peers
0 dynamic peers
PE-CE EBGP session: import and export policies
By default, all local routes to the IP-VRF route-table are automatically advertised in EVPN-IFL routes. This includes static routes, local routes, IGP routes, arp-nd host routes, and so on. Consider the following for routes coming from or going to a PE-CE EBGP session.
-
EVPN-IFL to PE-CE EBGP: An export policy must be configured so that EVPN IFL routes can be re-advertised to a CE on the PE-CE BGP session.
-
PE-CE EBGP to EVPN-IFL: Either an import policy to accept the routes or ebgp-default-policy import-reject-all false must be configured so that the BGP routes are re-advertised to EVPN-IFL.
For example, the following two policies are configured to import and export all routes:
PE-CE EBGP session: import and export policies
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# /info routing-policy policy import-all
routing-policy {
policy import-all {
statement 1 {
action {
accept {
}
}
}
}
}
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# /info routing-policy policy export-all
routing-policy {
policy export-all {
statement 1 {
action {
accept {
}
}
}
}
}
Additional PE-CE considerations
BGP PE-CE sessions can only be established with primary IP addresses. Therefore, in an IRB with both an anycast-gw-ip and a non-anycast-gw-ip, the BGP session can be setup against the non-anycast-gw-ip only if it is configured as primary.
A BGP session is not established if the configured BGP local-address for that session is a non-primary address. Adding a secondary address on an interface where the primary address has established a BGP session is supported.
BGP PE-CE sessions and primary IP addresses
In the following, the local IP address is primary, but not an anycast-gw IP:
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# /info from state interface irb0 subinterface 2
interface irb0 {
subinterface 2 {
admin-state enable
ip-mtu 1500
name irb0.2
ifindex 1082130435
oper-state up
last-change "a day ago"
ipv4 {
allow-directed-broadcast false
address 20.1.1.2/24 {
anycast-gw false
origin static
primary
status preferred
}
arp {
duplicate-address-detection true
timeout 14400
learn-unsolicited true
debug [
messages
]
neighbor 20.1.1.3 {
link-layer-address 00:01:03:FF:00:0B
origin dynamic
expiration-time "an hour from now"
}
host-route {
populate dynamic {
}
}
evpn {
}
}
}
anycast-gw {
virtual-router-id 1
anycast-gw-mac 00:00:5E:00:01:01
anycast-gw-mac-origin vrid-auto-derived
}
Changing the route preference
In SR Linux, the route selection across BGP families (EVPN-IFL vs PE-CE IPv4/v6) occurs based on the route-table preference. For example, if the same prefix 31.31.31.31/32 is received on the IP-VRF-10's route-table via BGP PE-CE (ipv4 family) and via EVPN-IFL, the route with the lowest route-table preference wins. By default, the preference for both EVPN-IFL and BGP PE-CE is set to 170. Therefore, for the PE-CE route to be selected, change the preference for the PE-CE routes to a value lower than 170 as shown in the following example.
Changing the route preference
--{ [FACTORY] +* candidate shared default }--[ network-instance IP-VRF-10 ]--
# diff
protocols {
bgp {
preference {
+ ebgp 160
}
}
}
--{ [FACTORY] +* candidate shared default }--[ network-instance IP-VRF-10 ]--
# commit stay
All changes have been committed. Starting new transaction.
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show route-table ipv4-unicast prefix 31.31.31.31/32 detail
---------------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
---------------------------------------------------------------------------
Destination : 31.31.31.31/32
ID : 0
Route Type : bgp
Metric : 0
Preference : 160
Active : true
Last change : 2021-04-15T09:05:53.745Z
Resilient hash: false
--------------------------------------------------------------------------
Next hops: 1 entries
20.1.1.3 (indirect) resolved by 20.1.1.3/32 (arp-nd)
via 20.1.1.3 (direct) via [irb0.2]
--------------------------------------------------------------------------
Destination : 31.31.31.31/32
ID : 1
Route Type : bgp-evpn
Metric : 0
Preference : 170
Active : false
Last change : 2021-04-15T08:58:50.600Z
Resilient hash: false
-----------------------------------------------------------------------------
Next hops: 1 entries
3.3.3.3 (indirect) resolved by None (None)
-----------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
In R21.6, there is no ECMP across different owners (for instance across EVPN-IFL and PE-CE BGP), only within the same routing owner.
Note the following is not supported:
-
BGP PE-CE is not supported on CEs that make use of Ethernet Segments, that is, EVPN Multi-Homing. This is because BGP PE-CE routes resolution using EVPN IFL routes, is blocked. The CE can still use non-EVPN multi-homing (L3 multi-homing). The CE can have a link connected to a subinterface in LEAF-1/MAC-VRF-1 and another link to a subinterface in LEAF-2/MAC-VRF-2, because MAC-VRF-1 and MAC-VRF-2 attached to different subnets.
-
BGP PE-CE is not supported when the BGP session is established via VXLAN. That is, the BGP CE is connected to a leaf that is connected via vxlan to another leaf that hosts the IRB subinterface where the BGP PE session is established.
Checking multi-homing in an EVPN-VXLAN Layer 3 network
An EVPN-VXLAN Layer 3 network needs to provide a multi-homing solution where upstream and downstream traffic is always routed efficiently, without hair-pinning. As shown in Example of EVPN-VXLAN layer 3 multi-homing, LEAF-2 and LEAF-4 are all-active multi-homed to SERVER-1. The use of IRB anycast-gw IP and MAC addresses, along with the synchronization of MACs and ARPs on the multi-homed leaf nodes, provides efficient routing.
Consistency check for anycast-gw IPs
The configuration of the anycast-gw must be consistent in the IRB sub-interfaces of LEAF-2 and LEAF-4.
Checking anycast-gw IP address consistency
Check configuration consistency
// Leaf-2 irb0.24 subinterface
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state interface irb0 subinterface 24
interface irb0 {
subinterface 24 {
admin-state enable
ip-mtu 1500
name irb0.24
ifindex 1082130457
oper-state up
last-change "2 days ago"
ipv4 {
allow-directed-broadcast false
address 101.1.1.254/24 {
anycast-gw true
origin static
primary
status preferred
}
anycast-gw {
virtual-router-id 1
anycast-gw-mac 00:00:5E:00:01:01
anycast-gw-mac-origin vrid-auto-derived
}
// Leaf-4 irb.024 subinterface
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state interface irb0 subinterface 24
interface irb0 {
subinterface 24 {
admin-state enable
ip-mtu 1500
name irb0.24
ifindex 1082130457
oper-state up
last-change "2 days ago"
ipv4 {
allow-directed-broadcast false
address 101.1.1.254/24 {
anycast-gw true
origin static
primary
status preferred
}
anycast-gw {
virtual-router-id 1
anycast-gw-mac 00:00:5E:00:01:01
anycast-gw-mac-origin vrid-auto-derived
}
The anycast-gw-mac address is automatically derived by default as 00:00:5e:00:01:VRID per draft-ietf-bess-evpn-inter-subnet-forwarding. It can also be manually configured. Either way, the anycast-gw IP and MAC must match in the two leaf nodes.
Checking anycast-gw IP address resolution
In the next example, HOST-12 is configured with default-gw 101.1.1.254 (the anycast-gw IP address of BD24). When HOST-12 ARPs for the default-gw IP, the ARP Request can be hashed to either leaf. Regardless which leaf gets the ARP Request, the ARP reply contains the anycast-gw MAC. Unicast traffic from HOST-12 can now be hashed to either leaf (for example, LEAF-2 in Example of EVPN-VXLAN layer 3 multi-homing) and the receiving leaf node always routes the traffic directly to LEAF-3 without sending it to the peer leaf first (LEAF-4 in the example). Using no anycast-gw IPs or MAC addresses causes hair-pinning and uses unnecessary spine bandwidth.
The following shows the resolution of the anycast-gw IP from HOST-12 and upstream routed traffic.
anycast-gw IP address resolution
[:host-12]$ arp -n -I veth2
[:host-12]$
[:host-12]$
[:host-12]$ ping 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
02:58:15.782587 00:00:64:01:01:01 > Broadcast, ethertype ARP (0x0806), length 42:
Request who-has 101.1.1.254 tell 101.1.1.1, length 28
02:58:15.787404 00:00:5e:00:01:01 > 00:00:64:01:01:01, ethertype ARP (0x0806),
length 60: Reply 101.1.1.254 is-at 00:00:5e:00:01:01, length 46
02:58:15.787436 00:00:64:01:01:01 > 00:00:5e:00:01:01, ethertype IPv4 (0x0800),
length 98: 101.1.1.1 > 33.33.33.33: ICMP echo request, id 3140, seq 1, length 64
02:58:15.791393 00:00:5e:00:01:01 > 00:00:64:01:01:01, ethertype IPv4 (0x0800),
length 98: 33.33.33.33 > 101.1.1.1: ICMP echo reply, id 3140, seq 1, length 64
64 bytes from 33.33.33.33: icmp_seq=1 ttl=63 time=8.96 ms
--- 33.33.33.33 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8008ms
rtt min/avg/max/mdev = 2.362/3.792/8.962/1.907 ms
[:host-12]$ arp -n -i veth2
Address HWtype HWaddress Flags Mask Iface
101.1.1.254 ether 00:00:5e:00:01:01 C veth2
Checking synchronization in both multi-home leaf nodes
As shown in Example of EVPN-VXLAN layer 3 multi-homing, downstream routed traffic from LEAF-3 to HOST-12 is routed directly by LEAF-2 or LEAF-4 without hair-pinning, regardless of who gets the packets. This occurs because HOST-12's ARP and the MAC entries are synchronized in both multi-homed leaf nodes. LEAF-2 learns 101.1.1.1->00:00:64:01:01:01 (host-12 ip and mac) as dynamic and advertises both in MAC/IP routes that are imported by LEAF-4. LEAF-4 installs the HOST-12 ARP and MAC entries as evpn. However, the MAC points at the local ES lag interface, and forwarding is direct to HOST-12.
Synchronization in both multi-home leaf nodes
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24
+-----------+-----------+----------------+-----------+----------------------+-----------------+
| Interface | Subinterf | Neighbor | Origin | Link layer address | Expiry |
| | ace | | | | |
+===========+===========+================+===========+======================+=================+
| irb0 | 24 | 101.1.1.1 | dynamic | 00:00:64:01:01:01 | 3 hours from now|
| irb0 | 24 | 101.1.1.4 | evpn | 00:01:04:FF:00:41 | |
+-----------+-----------+----------------+-----------+----------------------+-----------------+
---------------------------------------------------------------------------------------------
Total entries : 2 (0 static, 2 dynamic)
----------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table all
----------------------------------------------------------------------------------------------
Mac-table of network instance BD24
-----------------------------------------------------------------------------------------------
+--------------+---------------------+-------+--------+------+-----+--------------------------+
| Address | Destination | Dest- | Type |Active|Aging| Last Update |
| | | Index | | | | |
+==============+=====================+=======+========+======+=====+==========================+
| 00:00:5E:00: | irb | 0 | irb-int| true | N/A | 2021-04-13T10:42:14.000Z |
| 01:01 | | | erface-| | | |
| | | | anycast| | | |
| 00:00:64:01: | lag1.24 | 20 | learnt | true | 285 | 2021-04-15T10:13:11.000Z |
| 01:01 | | | | | | |
| 00:00:66:01: | ethernet-1/12.24 | 17 | learnt | true | 285 | 2021-04-15T10:13:11.000Z |
| 01:01 | | | | | | |
| 00:01:02:FF: | irb | 0 | irb-int| true | N/A | 2021-04-13T10:42:14.000Z |
| 00:41 | | | erface | | | |
| 00:01:04:FF: | vxlan-interface: | 202418| evpn- | true | N/A | 2021-04-13T10:42:54.000Z |
| 00:41 | vxlan1.24 | | | | | |
| | vtep:4.4.4.4 vni:24 | 653897| static | | | |
+--------------+---------------------+-------+--------+------+-----+--------------------------+
Total Irb Macs : 1 Total 1 Active
Total Static Macs : 0 Total 0 Active
Total Duplicate Macs : 0 Total 0 Active
Total Learnt Macs : 2 Total 2 Active
Total Evpn Macs : 0 Total 0 Active
Total Evpn static Macs : 1 Total 1 Active
Total Irb anycast Macs : 1 Total 1 Active
Total Macs : 5 Total 5 Active
-----------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
// ARP and MAC entries for Leaf-4
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24
+-----------+-----------+----------------+-----------+----------------------+-------+
| Interface | Subinterf | Neighbor | Origin | Link layer address | Expiry|
| | ace | | | | |
+===========+===========+================+===========+======================+=======+
| irb0 | 24 | 101.1.1.1 | evpn | 00:00:64:01:01:01 | |
| irb0 | 24 | 101.1.1.2 | evpn | 00:01:02:FF:00:41 | |
+-----------+-----------+----------------+-----------+----------------------+-------+
------------------------------------------------------------------------------------
Total entries : 2 (0 static, 2 dynamic)
------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table all
-------------------------------------------------------------------------------------
Mac-table of network instance BD24
------------------------------------------------------------------------------------
+-------------------+--------------------------+------+--------+------+-----+-----------------+
| Address | Destination |Dest | Type |Active|Aging| Last Update |
| | |Index | | | | |
+===================+==========================+======+========+======+=====+=================+
| 00:00:5E:00:01:01 | irb | 0 |irb-int | true | N/A |2021-04-13T10:42:|
| | | |erface- | | | 38.000Z |
| | | |anycast | | | |
| 00:00:64:01:01:01 | lag1.24 | 18 |evpn | true | N/A |2021-04-15T10:04:|
| | | | | | | 12.000Z |
| 00:00:66:01:01:01 |vxlan-interface:vxlan1.24 |202418|evpn | true | N/A |2021-04-15T10:13:|
| |vtep:2.2.2.2 vni:24 |654989| | | | 12.000Z |
| 00:01:02:FF:00:41 |vxlan-interface:vxlan1.24 |202418|evpn- | true | N/A |2021-04-13T10:42:|
| |vtep:2.2.2.2 vni:24 |654989|static | | | 54.000Z |
| 00:01:04:FF:00:41 |irb |0 |irb-int | true | N/A |2021-04-13T10:42:|
| | | |erface | | | 38.000Z |
+-------------------+--------------------------+------+--------+------+-----+-----------------+
Total Irb Macs : 1 Total 1 Active
Total Static Macs : 0 Total 0 Active
Total Duplicate Macs : 0 Total 0 Active
Total Learnt Macs : 0 Total 0 Active
Total Evpn Macs : 2 Total 2 Active
Total Evpn static Macs : 1 Total 1 Active
Total Irb anycast Macs : 1 Total 1 Active
Total Macs : 5 Total 5 Active
-----------------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Non-anycast-gw IP addresses
In addition to using anycast-gw IPs for the routed traffic, the multi-homed leaf nodes also have non-anycast-gw IPs that can be used for ICMP. The examples that follow check the availability of each individual Leaf IRB (LEAF-2 and LEAF-4).
Checking LEAF-2 IRB availability
LEAF-2 has a non-anycast-gw IP 101.1.1.2:
Check LEAF-2 IRB availability
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show interfaces irb0.24
====================================================================================
Net instance : IP-VRF-10
Interface : irb0.24
Oper state : up
Ip mtu : 1500
Prefix Origin Status
==================================================================================
101.1.1.2/24 static preferred
101.1.1.254/24 static preferred, primary, anycast
102.1.1.254/24 static preferred
2001:db8:24::254/64 static preferred, primary, anycast
fe80::200:5eff:fe00:101/64 link-layer preferred, anycast
fe80::201:2ff:feff:41/64 link-layer preferred
====================================================================================
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
Checking LEAF-4 IRB availability
LEAF-4 has a non-anycast-gw IP 101.1.1.4 in the same IRB:
Check LEAF-4 IRB availability
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show interfaces irb0.24
=====================================================================================
Net instance : IP-VRF-10
Interface : irb0.24
Oper state : up
Ip mtu : 1500
Prefix Origin Status
===================================================================================
101.1.1.4/24 static preferred
101.1.1.254/24 static preferred, primary, anycast
fe80::200:5eff:fe00:101/64 link-layer preferred, anycast
fe80::201:4ff:feff:41/64 link-layer preferred
====================================================================================
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
Checking non-anycast-gw IPs reachability
Both non-anycast-gw IPs are reachable from HOST-12. ARP Requests to non-anycast-gw IPs reply with the chassis MAC of the leaf and not with the anycast-gw MAC of the IRB. This allows using the non-anycast-gw IPs for troubleshooting purposes when there are anycast-gw IPs on the same IRBs. The example output from HOST-12 demonstrates this:
non-anycast-gw IPs reachability from host
[:host-12]$ arp -n -i veth2
Address HWtype HWaddress Flags Mask Iface
101.1.1.254 ether 00:00:5e:00:01:01 C veth2
[:host-12]$ ping 101.1.1.2
PING 101.1.1.2 (101.1.1.2) 56(84) bytes of data.
03:25:41.291765 00:00:64:01:01:01 > Broadcast, ethertype ARP (0x0806), length 42:
Request who-has 101.1.1.2 tell 101.1.1.1, length 28
03:25:41.295105 00:01:02:ff:00:41 > 00:00:64:01:01:01, ethertype ARP (0x0806),
length 60: Reply 101.1.1.2 is-at 00:01:02:ff:00:41, length 46
03:25:41.295130 00:00:64:01:01:01 > 00:01:02:ff:00:41, ethertype IPv4 (0x0800),
length 98:101.1.1.1 > 101.1.1.2: ICMP echo request, id 3307, seq 1, length 64
03:25:41.299204 00:01:02:ff:00:41 > 00:00:64:01:01:01, ethertype IPv4 (0x0800),
length 98: 101.1.1.2 > 101.1.1.1: ICMP echo reply, id 3307, seq 1, length 64
64 bytes from 101.1.1.2: icmp_seq=1 ttl=64 time=7.59 ms
--- 101.1.1.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.073/3.684/7.596/2.269 ms
[:host-12]$ arp -n -i veth2
Address HWtype HWaddress Flags Mask Iface
101.1.1.254 ether 00:00:5e:00:01:01 C veth2
101.1.1.2 ether 00:01:02:ff:00:41 C veth2
[:host-12]$ ping 101.1.1.4
PING 101.1.1.4 (101.1.1.4) 56(84) bytes of data.
03:25:52.696934 00:00:64:01:01:01 > Broadcast, ethertype ARP (0x0806), length 42:
Request who-has 101.1.1.4 tell 101.1.1.1, length 28
03:25:52.700615 00:01:04:ff:00:41 > 00:00:64:01:01:01, ethertype ARP (0x0806),
length 60: Reply 101.1.1.4 is-at 00:01:04:ff:00:41, length 46
03:25:52.700649 00:00:64:01:01:01 > 00:01:04:ff:00:41, ethertype IPv4 (0x0800),
length 98: 101.1.1.1 > 101.1.1.4: ICMP echo request, id 3318, seq 1, length 64
03:25:52.703463 00:01:04:ff:00:41 > 00:00:64:01:01:01, ethertype IPv4 (0x0800),
length 98: 101.1.1.4 > 101.1.1.1: ICMP echo reply, id 3318, seq 1, length 64
64 bytes from 101.1.1.4: icmp_seq=1 ttl=64 time=6.64 ms
--- 101.1.1.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 2.200/3.821/6.648/2.006 ms
[:host-12]$ arp -n -i veth2
Address HWtype HWaddress Flags Mask Iface
101.1.1.4 ether 00:01:04:ff:00:41 C veth2
101.1.1.254 ether 00:00:5e:00:01:01 C veth2
101.1.1.2 ether 00:01:02:ff:00:41 C veth2
Additional anycast gateway considerations
The following guidelines also apply for using anycast-gw in SR Linux.
In a bgp-evpn-enabled MAC-VRF with an IRB subinterface, the following applies whether the IPs are configured as primary, anycast-gw, or neither of these.
All IPv4 and IPv6 addresses associated with the IRB subinterface are advertised in separate MAC/IP routes.
The anycast-gw-mac and its corresponding anycast-gw IP address are advertised in a MAC/IP route.
Any other existing non-anycast-gw IP is advertised along with the interface MAC (hw-mac) in a MAC/IP route.
For example, if irb0.24 is configured in LEAF-2 with anycast-gw (ip,mac)=(101.1.1.254/24, 00:5e:00:00:01:01), and 101.1.1.2/24 is also configured as non-anycast-gw IP, two MAC/IP routes are advertised in the context of BD24: MAC/IP route (101.1.1.254, 00:5e:00:00:01:01), and MAC/IP route (101.1.1.2, hw-mac-1).
For IPv6, Local Link Addresses (LLDs) are also advertised in addition to global addresses.
When the IRB subinterface is admin disabled, the IRB MAC addresses are removed from the mac-table (and withdrawn from EVPN). ARP Requests and Neighbor Solicitation messages for the IRB subinterface IP addresses from the hosts connected to the Broadcast Domain are only processed when coming from local subinterfaces. These messages cannot be processed when received over VXLAN, so each of the Leaf routers attached to the same BD need to have their anycast IRB subinterface operationally up to process the requests for the local hosts.
Checking protection flags per MAC address
The IRB MAC addresses are protected in the mac-table if they are not anycast-gw-MACs. Protection means that received frames are dropped if their MAC SA match a protected MAC. The mac-table state shows the protection flag per MAC.
mac-table state
--{ [FACTORY] + candidate shared default }--[ ]--
# info from state network-instance BD24 bridge-table mac-table mac *
network-instance BD24 {
bridge-table {
mac-table {
mac 00:00:5E:00:01:01 {
destination-type irb-interface
destination-index 0
type irb-interface-anycast
last-update "2 days ago"
destination irb
is-protected false
}
mac 00:01:02:FF:00:41 {
destination-type irb-interface
destination-index 0
type irb-interface
last-update "2 days ago"
destination irb
is-protected true
}
mac 00:01:04:FF:00:41 {
destination-type vxlan
destination-index 202418653897
type evpn-static
last-update "2 days ago"
destination "vxlan-interface:vxlan1.24 vtep:4.4.4.4 vni:24"
is-protected true
}
}
}
}
Testing and checking Layer 3 host mobility
In EVPN-VXLAN Layer 3 networks, multiple leaf nodes are attached to the same BD. Hosts of the same subnet can be connected to any of those leaf nodes. They can also move between leaf nodes of the same BD. In either case, the upstream and downstream traffic must be efficient and avoid hair-pinning. This is shown in the following figure, where LEAF-2 and LEAF-4 configurations are modified (no ES) and HOST-12 was originally connected to LEAF-2.
Upstream traffic from HOST-12 to HOST-3 must be routed by LEAF-2 to LEAF-3 directly. If HOST-12 later moves to LEAF-4, upstream traffic to HOST-3 must be routed by LEAF-4 to LEAF-3 directly. This is accomplished using anycast-gw IPs and MACs on the IRB interfaces.
When HOST-12 is attached to LEAF-2, downstream traffic from HOST-3 must be sent from LEAF-3 to LEAF-2 directly. If HOST-12 later moves to LEAF-4, the routers need to update their tables quickly so that LEAF-3 routes the traffic to LEAF-4 directly, and no bandwidth is wasted on the spines because of unnecessary hair-pinning. This is achieved by learning HOST-12's IP address in the route-table of the connected leaf as a /32 route and advertising that host route in an EVPN IFL route.
Upon a mobility event to LEAF-4, LEAF-2 withdraws the host route as fast as possible and LEAF-4 then advertises the HOST-12 host route in an EVPN IFL route.
Configuring efficient host routing
In the initial configuration, HOST-12 is connected to LEAF-2. For LEAF-3 to route traffic (to HOST-12) directly to LEAF-2, LEAF-2 needs to learn HOST-12's IP and advertise its host route in an EVPN-IFL route.
In the next example, the following parameter definitions apply.
-
learn-unsolicited true - Triggers the node to snoop/process all solicited and unsolicited ARP messages received on sub-interfaces (no vxlan) and learns the corresponding ARP/ND entries as 'dynamic'. By default, the command is false and only solicited entries are learned, which does not guarantee host mobility.
When enabled, dynamic ARP/ND entries are learned from the following messages received on the sub-interfaces (if the IPs fall into the local subnets):
-
ARP and Neighbor Solicitation requests
-
Gratuitous ARP requests and unsolicited neighbor advertisements
-
-
host-route populate dynamic -Triggers the creation of arp-nd host routes in the IP-VRF-10 route-table out of dynamic ARP entries. These are disabled by default. The arp-nd host routes are not installed in the FIB. They are only used in the control plane and advertised to the EVPN-IFL network to attract traffic from LEAF-3. The arp-nd host routes can be exported in any routing protocol, such as EVPN-IFL routes, BGP IPv4/IPv6 routes, OSPF, and ISIS. They are supported in network-instances ip-vrf and default.
-
evpn advertise dynamic - Triggers the advertisement of EVPN MAC/IP routes for the dynamic learned ARP entries and allows the synchronization of the ARP entries in all IRB sub-interfaces of the same BD. This is only supported on IRB sub-interfaces.
The MAC/IP routes that are advertised for ARP/ND entries contain the S bit set if the corresponding MAC entry in the mac-table is static.
Note that an equivalent command can be used for ND entries.
Efficient host routing model
# subinterface 24
--{ [FACTORY] + candidate shared default }--[ interface irb0 subinterface 24 ]--
# info
ipv4 {
address 101.1.1.2/24 {
}
address 101.1.1.254/24 {
anycast-gw true
primary
}
arp {
learn-unsolicited true
debug [
messages
]
host-route {
populate dynamic {
}
}
evpn {
advertise dynamic {
}
}
}
}
anycast-gw {
}
The next examples show how an ARP Request from HOST-12 to a random IP in the subnet is enough for the irb0.24 to learn the dynamic ARP. It can then create a host route that is advertised as an EVPN IFL route, and imported by LEAF-3.
LEAF-2 - HOST-12 unsolicited ARP Request
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24
+-----------+--------------+----------+-------+--------------------+------+
| Interface | Subinterface | Neighbor | Origin| Link layer address |Expiry|
+===========+==============+==========+=======+====================+======+
| irb0 | 24 |101.1.1.4 | evpn | 00:01:04:FF:00:41 | |
+-----------+--------------+----------+-------+--------------------+------+
--------------------------------------------------------------------------
Total entries : 1 (0 static, 1 dynamic)
--------------------------------------------------------------------------
Debug messages - ARP request received
2021-04-15T04:58:05.651861-07:00 dut2 local6|INFO sr_arp_nd_mgr: arpnd|1773|1773|19334|I:
Received ARP request on interface irb0.24 (10247 - 22) from datapath. Source Mac :
00:00:64:01:01:01 Source IP : 101.1.1.1 Target Mac : 00:00:00:00:00:00 Target IP :
101.1.1.200
Triggered learning of RT5 and RT2 advertisements
2021-04-15T04:58:06.019955-07:00 dut2 local6|DEBU sr_bgp_mgr: bgp|4933|5176|2128215|D:
VR default (1) Peer 1: 3.3.3.3 UPDATE: Peer 1: 3.3.3.3 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 85
Flag: 0x90 Type: 14 Len: 48 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 2.2.2.2
Type: EVPN-IP-PREFIX Len: 34 RD: 2.2.2.2:10, tag: 0, ip_prefix: 101.1.1.1/32
gw_ip 0.0.0.0 Label: 10
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 24 Extended Community:
target:64500:10
mac-nh:00:01:02:ff:00:00
bgp-tunnel-encap:VXLAN
2021-04-15T04:58:06.020003-07:00 dut2 local6|DEBU sr_bgp_mgr: bgp|4933|5176|2128216|D:
VR default (1) Peer 1: 3.3.3.3 UPDATE: Peer 1: 3.3.3.3 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 97
Flag: 0x90 Type: 14 Len: 45 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 2.2.2.2
Type: EVPN-MAC Len: 37 RD: 2.2.2.2:24 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:00:64:01:01:01, IP len: 4, IP: 101.1.1.1, label1: 24
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:24
bgp-tunnel-encap:VXLAN
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24
+---------+----------+----------+---------+-------------------+------------------+
|Interface| Sub- | Neighbor | Origin | Link layer | Expiry |
| |interface | | | address | |
+=========+==========+==========+=========+===================+==================+
| irb0 | 24 |101.1.1.1 | dynamic | 00:00:64:01:01:01 | 3 hours from now |
| irb0 | 24 |101.1.1.4 | evpn | 00:01:04:FF:00:41 | |
+---------+----------+----------+---------+-------------------+------------------+
---------------------------------------------------------------------------------
Total entries : 2 (0 static, 2 dynamic)
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show route-table ipv4-unicast prefix 101.1.1.1/32 detail
--------------------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
-----------------------------------------------------------------------------
Destination : 101.1.1.1/32
ID : 0
Route Type : arp-nd
Metric : 0
Preference : 1
Active : true
Last change : 2021-04-15T11:58:05.653Z
Resilient hash: false
---------------------------------------------------------------------------------
Next hops: 1 entries
101.1.1.1 (direct) via [irb0.24]
----------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]—
LEAF-3 imports routes as bgp-evpn host route
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
# show route-table ipv4-unicast summary
------------------------------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
------------------------------------------------------------------------------------------
+----------------+------+-------+----------+-------+-----+--------------------+----------+
| Prefix | ID |Active | Route | Metric| Pref| Next-hop | Next-hop |
| | | | Type | | | (Type) |Interface |
+================+======+=======+==========+=======+=====+====================+==========+
| 20.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 20.1.1.3/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 22.22.22.22/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 31.31.31.31/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 31.31.31.31/32 | 1 | false | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 33.33.33.33/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 44.44.44.44/32 | 0 | true | bgp-evpn | 0 | 170 | 4.4.4.4 (indirect) | None |
| 101.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| | | | | | | 4.4.4.4 (indirect) | None |
| 101.1.1.1/32 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 102.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 2.2.2.2 (indirect) | None |
| 103.1.1.0/24 | 0 | true | local | 0 | 0 | 103.1.1.3 (direct) | irb0.3 |
| 103.1.1.1/32 | 0 | true | arp-nd | 0 | 1 | 103.1.1.1 (direct) | irb0.3 |
| 103.1.1.3/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 103.1.1.254/32 | 0 | true | host | 0 | 0 | None (extract) | None |
| 103.1.1.255/32 | 0 | true | host | 0 | 0 | None (broadcast) | None |
| 104.1.1.0/24 | 0 | true | bgp-evpn | 0 | 170 | 4.4.4.4 (indirect) | None |
+----------------+------+-------+-----------+------+-----+--------------------+----------+
-----------------------------------------------------------------------------------------
16 IPv4 routes total
15 IPv4 prefixes with active routes
1 IPv4 prefixes with active ECMP routes
----------------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ network-instance IP-VRF-10 ]--
Mobility event - efficient host routing
When HOST-12 is attached to LEAF-2, the ARP entry must be maintained even if HOST-12 does not send any traffic. If the entry is removed or ages out, the associated arp-nd host route in IP-VRF-10 is removed and the EVPN-IFL route withdrawn. This can cause hair-pinning for traffic routed from LEAF-3. To maintain the HOST-12 ARP entry (and other dynamic ARP/ND entries), the system supports timer-based ARP/ND refreshes (ARP-Request for the host IP).
Timer-based refreshes are triggered 30 seconds before the ARP age-out timer expires, and irrespective of the arrival of packets requiring resolution for the entry. Note that in SR OS, the arp-proactive-refresh command is needed so that entries are always refreshed irrespective of the arrival of packets that hit the entry. In SR Linux, this is the default behavior, so there is no command to enable the timer-based refreshes.
When HOST-12 moves from LEAF-2 to LEAF-4, LEAF-4 must advertise the host route for 101.1.1.1/32 in EVPN-IFL as fast as possible and LEAF-2 withdraws its EVPN-IFL route for it. The process used by LEAF-2 and LEAF-4 to update their ARP/route-tables when HOST-12 moves between them is called "EVPN Layer 3 host mobility". SR Linux provides this support per section 4 of draft-ietf-bess-evpn-inter-subnet-forwarding. EVPN Layer 3 host mobility supports the three cases specified in the draft:
HOST-12 moves to LEAF-4 and generates a GARP
HOST-12 moves to LEAF-4 and generates traffic, but not ARP
HOST-12 moves to LEAF-4 and remains silent
To support fast mobility, SR Linux supports triggered refreshes. Triggered refreshes (ARP-Requests on events and not based on timer expiration) are issued from irb0.24 leaf nodes, for the existing dynamic ARP entry 101.1.1.1>00:00:64:01:01:01. The following events apply:
an EVPN MAC/IP route for 101.1.1.1-> 00:00:64:01:01:01 is received
an EVPN route for 00:00:64:01:01:01 (no IP) is received
00:00:64:01:01:01 ages out in the mac-table (or the entry in the MAC table is cleared manually)
As shown in Example of L3 host mobility, when HOST-12 moves to LEAF-4, and if it issues a GARP or ethernet traffic, the advertised routes immediately updates the ARP/route tables on both leaf nodes. LEAF-3 then changes its next-hop for HOST-12 from LEAF-2 to LEAF-4.
Silent move - HOST-2 initially attached to LEAF-2
--{ [FACTORY] + candidate shared default }—[ ]--
# show arpnd arp-entries interface irb0 subinterface 24 ipv4-address 101.1.1.1
+---------+----------+-----------+---------+-------------------+-----------------+
|Interface| Sub- | Neighbor | Origin | Link layer | Expiry |
| |interface | | | address | |
+=========+==========+===========+=========+===================+=================+
| irb0 | 24 | 101.1.1.1 | dynamic | 00:00:64:01:01:01 | 3 hours from now|
+---------+----------+-----------+---------+-------------------+-----------------+
---------------------------------------------------------------------------------
Total entries : 1 (0 static, 1 dynamic)
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table mac 00:00:64:01:01:01
---------------------------------------------------------------------------------
Mac-table of network instance BD24
---------------------------------------------------------------------------------
Mac : 00:00:64:01:01:01
Destination : lag1.24
Dest Index : 20
Type : learnt
Programming Status : Success
Aging : 2680
Last Update : 2021-04-15T14:32:45.000Z
Duplicate Detect time : N/A
Hold down time remaining: N/A
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Silent move - initial LEAF-4
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24 ipv4-address 101.1.1.1
+---------+----------+-----------+---------+-------------------+-----------------+
|Interface| Sub- | Neighbor | Origin | Link layer | Expiry |
| |interface | | | address | |
+=========+==========+===========+=========+===================+=================+
| irb0 | 24 | 101.1.1.1 | evpn | 00:00:64:01:01:01 | |
+---------+----------+-----------+---------+-------------------+-----------------+
---------------------------------------------------------------------------------
Total entries : 1 (0 static, 1 dynamic)
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table mac 00:00:64:01:01:01
---------------------------------------------------------------------------------
Mac-table of network instance BD24
---------------------------------------------------------------------------------
Mac : 00:00:64:01:01:01
Destination : vxlan-interface:vxlan1.24 vtep:2.2.2.2 vni:24
Dest Index : 202418654989
Type : evpn
Programming Status : Success
Aging : N/A
Last Update : 2021-04-15T14:15:00.000Z
Duplicate Detect time : N/A
Hold down time remaining: N/A
-----------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]—
Silent move - watch command output for LEAF-3
Every 2.0s: show route-table ipv4-unicast prefix 101.1.1.1/32 detail
(Executions 903, Thu 07:41:40AM)
---------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
---------------------------------------------------------------------
Destination : 101.1.1.1/32
ID : 0
Route Type : bgp-evpn
Metric : 0
Preference : 170
Active : true
Last change : 2021-04-15T14:15:00.450Z
Resilient hash: false
------------------------------------------------------------------
Next hops: 1 entries
2.2.2.2 (indirect) resolved by None (None)
------------------------------------------------------------------
Silent move - move HOST-12 to LEAF-2
In this example, HOST-12 is moved to LEAF-4 to simulate a silent move. Immediately after flushing MAC 00:00:64:01:01:01 in LEAF-2, the MAC/IP routes are withdrawn and LEAF-2 issues three triggered refreshes.
--{ [FACTORY] + candidate shared default }--[ ]--
# 2021-04-15T07:43:16.422816-07:00 dut2 local6|INFO sr_arp_nd_mgr: arpnd|1773|1773|20438|I:
Sending ARP request on interface irb0.24 (10247 - 22). Source Mac : 00:00:5E:00:01:01
Source IP : 101.1.1.254 Target Mac : 00:00:00:00:00:00 Target IP : 101.1.1.1 Ethernet
SA 00:00:5E:00:01:01 Ethernet DA FF:FF:FF:FF:FF:FF
2021-04-15T07:43:16.422816-07:00 dut2 local6|INFO sr_arp_nd_mgr: arpnd|1773|1773|20438|I:
Sending ARP request on interface irb0.24 (10247 - 22). Source Mac : 00:00:5E:00:01:01
Source IP : 101.1.1.254 Target Mac : 00:00:00:00:00:00 Target IP : 101.1.1.1 Ethernet
SA 00:00:5E:00:01:01 Ethernet DA FF:FF:FF:FF:FF:FF
2021-04-15T07:43:16.422816-07:00 dut2 local6|INFO sr_arp_nd_mgr: arpnd|1773|1773|20438|I:
Sending ARP request on interface irb0.24 (10247 - 22). Source Mac : 00:00:5E:00:01:01
Source IP : 101.1.1.254 Target Mac : 00:00:00:00:00:00 Target IP : 101.1.1.1 Ethernet
SA 00:00:5E:00:01:01 Ethernet DA FF:FF:FF:FF:FF:FF
Flag: 0x90 Type: 15 Len: 77 Multiprotocol Unreachable NLRI:
Address Family EVPN
Type: EVPN-MAC Len: 37 RD: 2.2.2.2:24 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:00:64:01:01:01, IP len: 4, IP: 101.1.1.1, label1: 0
Type: EVPN-MAC Len: 33 RD: 2.2.2.2:24 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:00:64:01:01:01, IP len: 0, IP: NULL, label1: 0
Silent move - LEAF-2 updates
When the refreshes arrive at HOST-12 in LEAF-4, the ARP reply is consumed by LEAF-4 (since the MAC destination address matches the anycast-gw MAC address). LEAF-4 then advertises the MAC/IP routes and IP Prefix route for HOST-12.
Type: EVPN-IP-PREFIX Len: 34 RD: 4.4.4.4:10, tag: 0, ip_prefix: 101.1.1.1/32
gw_ip 0.0.0.0 Label: 10
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 24 Extended Community:
target:64500:10
mac-nh:00:01:04:ff:00:00
bgp-tunnel-encap:VXLAN
2021-04-15T07:43:18.763551-07:00 dut2 local6|DEBU sr_bgp_mgr: bgp|4933|5176|2169872|D:
VR default (1) Peer 1: 4.4.4.4 UPDATE: Peer 1: 4.4.4.4 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 97
Flag: 0x90 Type: 14 Len: 45 Multiprotocol Reachable NLRI:
Address Family EVPN
NextHop len 4 NextHop 4.4.4.4
Type: EVPN-MAC Len: 37 RD: 4.4.4.4:24 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:00:64:01:01:01, IP len: 4, IP: 101.1.1.1, label1: 24
Type: EVPN-MAC Len: 33 RD: 4.4.4.4:24 ESI: ESI-0, tag: 0, mac len: 48 mac:
00:00:64:01:01:01, IP len: 0, IP: NULL, label1: 24
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64500:24
bgp-tunnel-encap:VXLAN
Type: EVPN-IP-PREFIX Len: 34 RD: 4.4.4.4:10, tag: 0, ip_prefix:
101.1.1.1/32 gw_ip 0.0.0.0 Label: 10
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 24 Extended Community:
target:64500:10
mac-nh:00:01:04:ff:00:00
bgp-tunnel-encap:VXLAN
2021-04-15T07:43:18.766923-07:00 dut2 local6|INFO sr_arp_nd_mgr:
arpnd|1773|1773|20450|I: The ARP entry for 101.1.1.1 has been updated.
After the move, LEAF-2 and LEAF-4 tables are updated, and LEAF-3 points at LEAF-4 as the next-hop for the HOST-12 route.
Silent move - LEAF-2 tables
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24 ipv4-address 101.1.1.1
+---------+----------+-----------+---------+-------------------+-----------------+
|Interface| Sub- | Neighbor | Origin | Link layer | Expiry |
| |interface | | | address | |
+=========+==========+===========+=========+===================+=================+
| irb0 | 24 | 101.1.1.1 | evpn | 00:00:64:01:01:01 | |
+---------+----------+-----------+---------+-------------------+-----------------+
----------------------------------------------------------------------------------
Total entries : 1 (0 static, 1 dynamic)
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table mac 00:00:64:01:01:01
--------------------------------------------------------------------------------
Mac-table of network instance BD24
--------------------------------------------------------------------------------
Mac : 00:00:64:01:01:01
Destination : vxlan-interface:vxlan1.24 vtep:4.4.4.4 vni:24
Dest Index : 202418653897
Type : evpn
Programming Status : Success
Aging : N/A
Last Update : 2021-04-15T14:43:18.000Z
Duplicate Detect time : N/A
Hold down time remaining: N/A
--------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Silent move - LEAF-4 tables
--{ [FACTORY] + candidate shared default }--[ ]--
# show arpnd arp-entries interface irb0 subinterface 24 ipv4-address 101.1.1.1
+---------+----------+-----------+---------+------------------+------------------+
|Interface| Sub- | Neighbor | Origin | Link layer | Expiry |
| |interface | | | address | |
+=========+==========+===========+=========+==================+==================+
| irb0 | 24 | 101.1.1.1 | dynamic | 00:00:64:01:01:01| 3 hrs from now |
+---------+----------+-----------+---------+------------------+------------------+
----------------------------------------------------------------------------------
Total entries : 1 (0 static, 1 dynamic)
---------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
--{ [FACTORY] + candidate shared default }--[ ]--
# show network-instance BD24 bridge-table mac-table mac 00:00:64:01:01:01
---------------------------------------------------------------------------------
Mac-table of network instance BD24
---------------------------------------------------------------------------------
Mac : 00:00:64:01:01:01
Destination : lag1.24
Dest Index : 18
Type : learnt
Programming Status : Success
Aging : 2875
Last Update : 2021-04-15T14:43:18.000Z
Duplicate Detect time : N/A
Hold down time remaining: N/A
-----------------------------------------------------------------------------------
--{ [FACTORY] + candidate shared default }--[ ]--
Silent move - watch command output for LEAF-3
Every 2.0s: show route-table ipv4-unicast prefix 101.1.1.1/32 detail
(Executions 976, Thu 07:44:30AM)
-------------------------------------------------------------------
IPv4 Unicast route table of network instance IP-VRF-10
------------------------------------------------------------------
Destination : 101.1.1.1/32
ID : 0
Route Type : bgp-evpn
Metric : 0
Preference : 170
Active : true
Last change : 2021-04-15T14:43:19.767Z
Resilient hash: false
-------------------------------------------------------------------
Next hops: 1 entries
4.4.4.4 (indirect) resolved by None (None)
-------------------------------------------------------------------
EVPN-VXLAN Layer 3 feature parity for IPv6 prefixes
All the features discussed in this chapter are supported for IPv6 prefixes and hosts. EVPN IFL works for Prefix IPv6 routes without enabling a separate BGP family. EVPN supports IPv4 and IPv6 routes. In addition, all IRB sub-interfaces must be configured with the IPv6 container using the same commands used earlier in this chapter, but performed under ‟neighbor-discovery”.
Configuring IPv6 Container
IPv6 container configuration
--{ [FACTORY] + candidate shared default }--[ interface irb0 subinterface 24 ]--
# info
ipv4 {
address 101.1.1.2/24 {
}
address 101.1.1.254/24 {
anycast-gw true
primary
}
address 102.1.1.254/24 {
}
arp {
learn-unsolicited true
debug [
messages
]
host-route {
populate dynamic {
}
}
evpn {
advertise dynamic {
}
}
}
}
ipv6 {
address 2001:db8:24::254/64 {
anycast-gw true
}
neighbor-discovery {
learn-unsolicited both
host-route {
populate dynamic {
}
}
evpn {
advertise dynamic {
}
}
}
anycast-gw {
}
Additional feature parity considerations
The anycast-gw container is common for IPv4 and IPv6. Therefore, the anycast-gw mac is the same for both families. Only one anycast-gw MAC is programmed in the interface, and IPv4 and IPv6 packets use this anycast-gw-mac as MAC SA when sourcing packets to the BD.
LLA and global addresses are advertised in EVPN. The command neighbor-discovery learn-unsolicited both includes global and link local addresses.
The following example shows that when anycast-gw is enabled, an anycast-gw LLA is automatically generated. The anycast-gw ipv6 link local address is based off the anycast-gw-mac when the anycast-gw and the ipv6 containers are present. The logic to compute this new anycast-gw ipv6 link local address is the same as is used for computing the regular ipv6 LLA except the anycast-gw-mac is used instead of the interface mac. This new ipv6 LLA appears in the list of ipv6 addresses associated with the subinterface, but with the attribute anycast-gw true.
Multicast NS messages use the anycast-gw LLA and anycast-gw MAC. Unicast NS use the global IPv6 and hw-address.
LLA generation
--{ [FACTORY] + candidate shared default }--[ interface irb0 subinterface 24 ipv6 ]--
# info from state
address 2001:db8:24::254/64 {
anycast-gw true
origin static
primary
status preferred
}
address fe80::200:5eff:fe00:101/64 {
anycast-gw true
origin link-layer
status preferred
}
address fe80::201:2ff:feff:41/64 {
origin link-layer
status preferred
}
neighbor-discovery {
duplicate-address-detection true
reachable-time 30
stale-time 14400
learn-unsolicited both
neighbor fe80::201:4ff:feff:41 {
link-layer-address 00:01:04:FF:00:41
origin evpn
}
host-route {
populate dynamic {
}
}
evpn {
advertise dynamic {
admin-tag 0
}
}
}
router-advertisement {
router-role {
current-hop-limit 64
managed-configuration-flag false
other-configuration-flag false
max-advertisement-interval 600
min-advertisement-interval 200
reachable-time 0
retransmit-time 0
router-lifetime 1800
}
}