EVPN Sticky ECMP for IP Prefix Routes

This chapter provides information about EVPN sticky ECMP for IP prefix routes.

Topics in this chapter include:

Applicability

The information and the configuration in this chapter are based on SR OS Release 25.3.R1. Sticky ECMP for BGP routes is supported in SR OS Release 19.10 and later. Sticky ECMP for EVPN IFL and EVPN IFF routes is supported in SR OS Release 23.10 and later.

EVPN sticky ECMP can be combined with weighted ECMP and IP aliasing ECMP. For weighted ECMP, see the EVPN unequal ECMP for RT5 IFL and IFF routes chapter. For IP aliasing ECMP, see the EVPN IP Aliasing for IP Prefix Routes chapter.

Overview

Weighted traffic distribution toward anycast services network shows the flow distribution in an EVPN network toward an anycast services network. The ECMP distribution of the flows from the border leaf BL-1 toward the Top of Racks (TORs) is weighted. For example, TOR-4 advertises a weight of 3 (expressed as a next-hop count of 3 in the EVPN Link Bandwidth Extended Community), which means it receives three times as many flows as TOR-5 and as the combined total of TOR-2 and TOR-3. CNF-6 has an EBGP session with TOR-2, but not with TOR-3. With IP aliasing ECMP configured between TOR-2 and TOR-3, TOR-3 forwards traffic to CNF-6 without tromboning via TOR-2.
Note: CNF stands for Containerized Network Function. In this example, the CNFs are simulated by SR OS nodes.
Figure 1. Weighted traffic distribution toward anycast services network

Sticky ECMP is implemented in software and—for FP-based platforms—each sticky route takes 64 next-hop hashing buckets in the data path. When CNF-10 is removed or a network failure causes CNF-10 to be unreachable, TOR-5 withdraws the IP prefix route for prefix 10.10.0.0/24 and the flows are redistributed over the remaining next-hops according to the weighted ECMP set. Existing flows to TOR-2, TOR-3, or TOR-4 may be affected and the potential applications' TCP sessions to the CNFs too. When sticky ECMP is configured, the existing flows via TOR-2, TOR-3, or TOR-4 remain unchanged; only the flows via TOR-5 are affected and must be redistributed over the remaining routes that are shown in Redistributed traffic flows after CNF-10 is removed. Table Sticky ECMP flow distribution when one next-hop is removed for 10.10.0.0/24 in the Appendix shows the distribution over the internal hashing buckets.

Figure 2. Redistributed traffic flows after CNF-10 is removed

The same issue arises when an additional CNF is added. When initially all flows are distributed over TOR-2, TOR-3, and TOR-4, and CNF-10 is added afterward, only a subset of the flows is affected. The total ECMP weight is initially 1 + 3 = 4 and after CNF-10 is added, the total ECMP weight becomes 1 + 3 + 1 = 5. This implies that 80% of the existing flows via TOR-2, TOR-3, or TOR-4 remain unchanged, while 20% of the flows moves to CNF-10 with TOR-5 as next-hop, see Sticky ECMP flow distribution when one next-hop is added for 10.10.0.0/24 in the Appendix.

The sticky-ecmp command enables stickiness and is configurable in policy actions.

[ex:/configure policy-options]
A:admin@BL-1# tree flat detail | match sticky-ecmp
policy-statement <string> default-action sticky-ecmp <boolean>
policy-statement <string> entry <number> action sticky-ecmp <boolean>
policy-statement <string> named-entry <string> action sticky-ecmp <boolean>

The sticky-ecmp command only has effect in a BGP import policy applied to one or more BGP peers in the base router or in a service; it has no effect in a BGP export policy.

Configuration

The following examples are described in this section;

Sticky ECMP can be combined with regular ECMP, weighted ECMP, and IP aliasing ECMP. In all examples in the following sections, weighted ECMP is enabled on BL-1 and all TORs; IP aliasing ECMP is configured on TOR-2 and TOR-3.

Sticky ECMP for EVPN IFL over MPLS

Example topology - IFL EVPN-MPLS shows the example topology with an EVPN-MPLS network in autonomous system (AS) 64500 with border leaf BL-1 and four TORs:
  • TOR-2 and TOR-3 are both connected to CNF-6 and have IP aliasing ECMP
  • TOR-4 is connected to CNF-7, CNF-8, and CNF-9
  • TOR-5 is connected to CNF-10
The CNFs in AS 64501 connect to an anycast services network. The ECMP distribution of the flows from the BL to the TORs are weighted based on the number of CNFs advertising the same anycast network to the TORs, so TOR-4 receives three times as many flows as TOR-5 or the combination of TOR-2 and TOR-3.
Figure 3. Example topology - IFL EVPN-MPLS

Configuration

The initial configuration includes:

  • cards, MDAs, ports
  • router interfaces
  • SR-ISIS on the router interfaces in AS 64500
  • IBGP between route reflector (RR) BL-1 and each TOR for the EVPN address family

Sticky ECMP is only configured on BL-1 using the following policy which adds stickiness to prefix 10.10.0.0/24 in VPRN-10:

# on BL-1:
configure {
    policy-options {
        community "comm-10" {
            member "target:64500:10" { }
        }
        prefix-list "cnf_ips-10" {
            prefix 10.10.0.0/24 type longer {
            }
        }
        policy-statement "import-add-stickiness-vprn-10" {
            entry 10 {
                from {
                    prefix-list ["cnf_ips-10"]
                    community {
                        name "comm-10"
                    }
                }
                action {
                    action-type accept
                    sticky-ecmp true     # add stickiness
                }
            }
            entry 11 {
                from {
                    community {
                        name "comm-10"
                    }
                }
                action {
                    action-type accept
                }
            }
        }

The sticky-ecmp command only has effect in import policies. It suffices to configure this policy on one BGP peer in the base router on BL-1, as follows. For the same destination 10.10.0.0/24, the router programs the next-hops (192.0.2.2, 192.0.2.3, 192.0.2.4, and 192.0.2.5) as sticky even if only BGP peer 192.0.2.2 is configured with this import policy.

# on RR BL-1:
configure {
    router "Base" {
        bgp {
            vpn-apply-export true
            vpn-apply-import true
            rapid-withdrawal true
            split-horizon true
            rapid-update {
                evpn true
            }
            group "TORs" {
                type internal
                peer-as 64500
                family {
                    evpn true
                }
                cluster {
                    cluster-id 192.0.2.1
                }
            }
            neighbor "192.0.2.2" {
                group "TORs"
                import {
                    policy ["import-add-stickiness-vprn-10"]
                }
            }
            neighbor "192.0.2.3" {
                group "TORs"
            }
            neighbor "192.0.2.4" {
                group "TORs"
            }
            neighbor "192.0.2.5" {
                group "TORs"
            }

Alternatively, the import policy can also be configured at service level, see further.

On the TORs, the BGP configuration does not include such import policy:

# on TOR-2, TOR-3, TOR-4, TOR-5:
configure {
    router "Base" {
        autonomous-system 64500
        bgp {
            vpn-apply-export true
            vpn-apply-import true
            rapid-withdrawal true
            rapid-update {
                evpn true
            }
            group "BL" {
                type internal
                peer-as 64500
                family {
                    evpn true
                }
            }
            neighbor "192.0.2.1" {
                group "BL"
            }
        }

The configuration of VPRN-10 on BL-1 is as follows:

# on BL-1:
configure {
    service {
        vprn "VPRN-10" {
            admin-state enable
            description "EVPN-MPLS IFL VPRN-10"
            service-id 10
            customer "1"
            ecmp 10
            bgp-evpn {
                mpls 1 {
                    admin-state enable
                    route-distinguisher "192.0.2.1:10"
                    evi 10
                    vrf-target {
                        community "target:64500:10"
                    }
                    auto-bind-tunnel {
                        resolution any
                    }
                    evpn-link-bandwidth {
                        weighted-ecmp true
                        advertise {
                        }
                    }
                }
            }
            interface "test-10" {
                ipv4 {
                    primary {
                        address 172.20.10.1
                        prefix-length 30
                    }
                }
                sap 1/1/c10/1:10 {
                }
            }
        }

The service configuration on TOR-2 is as follows. The Ethernet segment is configured on TOR-2 and TOR-3 for IP aliasing ECMP. EBGP is configured in VPRN-10 with neighbor CNF-6 on TOR-2 only, not on TOR-3.

# on TOR-2:
configure {
    service {
        system {
            bgp {
                evpn {
                    ethernet-segment "ES-10" {               # same on TOR-3
                        admin-state enable
                        type virtual
                        esi 00:00:00:00:00:23:23:23:10:00
                        multi-homing-mode all-active
                        association {
                            vprn-next-hop 10.100.10.1 {     # IP alias on CNF-6
                                virtual-ranges {
                                    evi 10 { }
                                }
                            }
                        }
                    }
                }
            }
        }
        vprn "VPRN-10" {
            admin-state enable
            description "EVPN-MPLS IFL VPRN-10"
            service-id 10
            customer "1"
            autonomous-system 64500
            ecmp 10
            router-id 192.0.2.2                           # on TOR-3: 192.0.2.3
            bgp-evpn {
                mpls 1 {
                    admin-state enable
                    route-distinguisher "192.0.2.2:10"  # on TOR-3: 192.0.2.3:10
                    evi 10
                    vrf-target {
                        community "target:64500:10"
                    }
                    auto-bind-tunnel {
                        resolution any
                    }
                    evpn-link-bandwidth {
                        weighted-ecmp true
                        advertise {
                        }
                    }
                }
            }
            bgp {                 # only on TOR-2; no EBGP in VPRN-10 on TOR-3
                router-id 10.100.10.2
                rapid-withdrawal true
                split-horizon true
                ebgp-default-reject-policy {
                    import false
                }
                group "PE-CE-10" {
                    type external
                    peer-as 64501
                    family {
                        ipv4 true
                    }
                    export {
                        policy ["export-evpn-ifl-bgp"]
                    }
                }
                neighbor "10.100.10.1" {
                    group "PE-CE-10"
                    local-address 10.100.10.2
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
            }
            interface "int-VPRN10-TOR-2-to-CNF-6" {    # "...-TOR-3-..."
                ipv4 {
                    primary {
                        address 10.10.26.1            # on TOR-3: 10.10.36.1/24
                        prefix-length 24
                    }
                }
                sap 1/1/c3/1:10 {
                }
            }
            interface "loopback" {                # only on TOR-2; not on TOR-3
                loopback true
                ipv4 {
                    primary {
                        address 10.100.10.2
                        prefix-length 32
                    }
                }
            }
            static-routes {
                route 10.100.10.1/32 route-type unicast {
                    next-hop "10.10.26.2" {             # on TOR-3: 10.10.36.2
                        admin-state enable
                    }
                }
            }
        }

The nodes in AS 64500 exchange EVPN IFL routes for VPRN-10, while the EBGP sessions between VPRN-10 on the TORs and the base router on the CNFs exchange BGP IPv4 routes. The export policy "export-evpn-ifl-bgp" in VPRN-10 on the TORs is needed to export BGP routes for the corresponding EVPN IFL routes:

# on TOR-2, TOR-3, TOR-4, TOR-5:
configure {
    policy-options {
        policy-statement "export-evpn-ifl-bgp" {
            description "export from EVPN-IFL to BGP"
            entry 10 {
                from {
                    protocol {
                        name [evpn-ifl]
                    }
                }
                to {
                    protocol {
                        name [bgp]
                    }
                }
                action {
                    action-type accept
                }
            }

On TOR-4, VPRN-10 is configured as follows:

# on TOR-4:
configure {
    service {
        vprn "VPRN-10" {
            admin-state enable
            description "EVPN-MPLS IFL VPRN-10"
            service-id 10
            customer "1"
            autonomous-system 64500
            ecmp 10
            bgp-evpn {
                mpls 1 {
                    admin-state enable
                    route-distinguisher "192.0.2.4:10"
                    evi 10
                    vrf-target {
                        community "target:64500:10"
                    }
                    auto-bind-tunnel {
                        resolution any
                    }
                    evpn-link-bandwidth {
                        weighted-ecmp true
                        advertise {
                        }
                    }
                }
            }
            bgp {
                rapid-withdrawal true
                split-horizon true
                ebgp-default-reject-policy {
                    import false
                }
                multipath {
                    family ipv4 {
                        max-paths 10
                    }
                }
                group "PE-CE-10" {
                    type external
                    peer-as 64501
                    family {
                        ipv4 true
                    }
                    export {
                        policy ["export-evpn-ifl-bgp"]
                    }
                }
                neighbor "10.10.47.2" {
                    group "PE-CE-10"
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
                neighbor "10.10.48.2" {
                    group "PE-CE-10"
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
                neighbor "10.10.49.2" {
                    group "PE-CE-10"
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
            }
            interface "int-VPRN10-TOR-4-CNF-7" {
                ipv4 {
                    primary {
                        address 10.10.47.1
                        prefix-length 24
                    }
                }
                sap 1/1/c4/1:10 {
                }
            }
            interface "int-VPRN10-TOR-4-CNF-8" {
                ipv4 {
                    primary {
                        address 10.10.48.1
                        prefix-length 24
                    }
                }
                sap 1/1/c5/1:10 {
                }
            }
            interface "int-VPRN10-TOR-4-CNF-9" {
                ipv4 {
                    primary {
                        address 10.10.49.1
                        prefix-length 24
                    }
                }
                sap 1/1/c6/1:10 {
                }
            }
        }

For VPRN-10, TOR-5 only has neighbor CNF-10 and the configuration is as follows:

# on TOR-5:
configure {
    service {
        vprn "VPRN-10" {
            admin-state enable
            description "EVPN-MPLS IFL VPRN-10"
            service-id 10
            customer "1"
            autonomous-system 64500
            ecmp 10
            bgp-evpn {
                mpls 1 {
                    admin-state enable
                    route-distinguisher "192.0.2.5:10"
                    evi 10
                    vrf-target {
                        community "target:64500:10"
                    }
                    auto-bind-tunnel {
                        resolution any
                    }
                    evpn-link-bandwidth {
                        weighted-ecmp true
                        advertise {
                        }
                    }
                }
            }
            bgp {
                rapid-withdrawal true
                split-horizon true
                ebgp-default-reject-policy {
                    import false
                }
                group "PE-CE-10" {
                    type external
                    peer-as 64501
                    family {
                        ipv4 true
                    }
                    export {
                        policy ["export-evpn-ifl-bgp"]
                    }
                }
                neighbor "10.10.105.2" {
                    group "PE-CE-10"
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
            }
            interface "int-VPRN10-TOR-5-CNF-10" {
                ipv4 {
                    primary {
                        address 10.10.105.1
                        prefix-length 24
                    }
                }
                sap 1/1/c3/1:10 {
                }
            }
        }

The BGP configuration on the corresponding CNF-10 is as follows:

# on CNF-10 (simulated by SR OS node):
configure {
    policy-options {
        community "comm-10" {
            member "target:64500:10" { }
        }
        prefix-list "anycast-ip-10" {
            prefix 10.10.0.0/24 type exact {
            }
        }
        policy-statement "export-anycast-ip-10" {
            entry 10 {
                from {
                    prefix-list ["anycast-ip-10"]
                    protocol {
                        name [direct]
                    }
                }
                action {
                    action-type accept
                    community {
                        add ["comm-10"]      # for VPRN-10 on TORs
                    }
                }
            }
        }
    }
    router "Base" {
        autonomous-system 64501
        bgp {
            rapid-withdrawal true
            split-horizon true
            ebgp-default-reject-policy {
                import false
            }
            group "PE-CE-10" {
            }
            neighbor "10.10.105.1" {
                group "PE-CE-10"
                type external
                peer-as 64500
                local-as {
                    as-number 64501
                }
                export {
                    policy ["export-anycast-ip-10"]
                }
            }
        }

The configuration on the other CNFs is similar. The BGP configuration of CNF-6 uses the alias IP 10.100.10.1, as follows:

# on CNF-6 (simulated by SR OS node):
configure {
    router "Base" {
        bgp {
            rapid-withdrawal true
            split-horizon true
            ebgp-default-reject-policy {
                import false
            }
            group "PE-CE-10" {
            }
            neighbor "10.100.10.2" {    # neighbor reachable via static route
                group "PE-CE-10"
                local-address 10.100.10.1
                type external
                peer-as 64500
                local-as {
                    as-number 64501
                }
                export {
                    policy ["export-anycast-ip-10"]
                }
            }
        }

Verification

BL-1 has stickiness applied for destination 10.10.0.0/24 on IBGP peer 192.0.2.2. For the same destination 10.10.0.0/24, BL-1 programs the next-hops as sticky even if only one of them is configured with sticky ECMP. The following route table for prefix 10.10.0.0 in VPRN-10 shows that sticky ECMP is not only requested for the EVPN IFL route with next-hop 192.0.2.2, but also for the EVPN IFL routes with next-hops 192.0.2.3, 192.0.2.4, or 192.0.2.5, as indicated with [S]:

[/]
A:admin@BL-1# show router service-name "VPRN-10" route-table 10.10.0.0

===============================================================================
Route Table (Service: 10)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.10.0.0/24   [S]                            Remote  EVPN-IFL  00h02m50s  170
       192.0.2.2 (tunneled:SR-ISIS:524291)                          10
10.10.0.0/24   [S]                            Remote  EVPN-IFL  00h02m50s  170
       192.0.2.3 (tunneled:SR-ISIS:524295)                          10
10.10.0.0/24   [S]                            Remote  EVPN-IFL  00h02m50s  170
       192.0.2.4 (tunneled:SR-ISIS:524299)                          10
10.10.0.0/24   [S]                            Remote  EVPN-IFL  00h02m50s  170
       192.0.2.5 (tunneled:SR-ISIS:524303)                          10
-------------------------------------------------------------------------------
No. of Routes: 4
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

Stickiness is applied for all routes to the same destination, regardless of the weight of these routes.

On BL-1, the extensive route table for prefix 10.10.0.0 shows that sticky ECMP is enabled (Sticky ECMP: Yes). The ECMP weight is different for the different next-hops, but the stickiness applies to all next-hops.

[/]
A:admin@BL-1# show router service-name "VPRN-10" route-table 10.10.0.0 extensive

===============================================================================
Route Table (Service: 10)
===============================================================================
Dest Prefix             : 10.10.0.0/24
  Protocol              : EVPN-IFL
  Age                   : 00h03m34s
  Preference            : 170
  Sticky ECMP           : Yes
  Indirect Next-Hop     : 192.0.2.2
    Label               : 524283
    VPN Next-Hop Index  : 23
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 192.0.2.2 (SR-ISIS tunnel:524291)
      Metric            : 10
      ECMP-Weight       : N/A
  Indirect Next-Hop     : 192.0.2.3
    Label               : 524281
    VPN Next-Hop Index  : 20
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 192.0.2.3 (SR-ISIS tunnel:524295)
      Metric            : 10
      ECMP-Weight       : N/A
  Indirect Next-Hop     : 192.0.2.4
    Label               : 524281
    VPN Next-Hop Index  : 25
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 3
    Resolving Next-Hop  : 192.0.2.4 (SR-ISIS tunnel:524299)
      Metric            : 10
      ECMP-Weight       : N/A
  Indirect Next-Hop     : 192.0.2.5
    Label               : 524283
    VPN Next-Hop Index  : 27
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 192.0.2.5 (SR-ISIS tunnel:524303)
      Metric            : 10
      ECMP-Weight       : N/A
-------------------------------------------------------------------------------
No. of Destinations: 1
===============================================================================

The FIB for prefix 10.10.0.0/24 includes the S-flag, as follows:

[/]
A:admin@BL-1# show router service-name "VPRN-10" fib 1 ip-prefix-prefix-length 10.10.0.0/24

===============================================================================
FIB Display
===============================================================================
Prefix [Flags]                                              Protocol
  NextHop
-------------------------------------------------------------------------------
10.10.0.0/24 [S]                                            EVPN-IFL
  192.0.2.2 (VPRN Label:524283 Transport:SR-ISIS:524291)
  192.0.2.3 (VPRN Label:524281 Transport:SR-ISIS:524295)
  192.0.2.4 (VPRN Label:524281 Transport:SR-ISIS:524299)
  192.0.2.5 (VPRN Label:524283 Transport:SR-ISIS:524303)
-------------------------------------------------------------------------------
Total Entries : 1
-------------------------------------------------------------------------------
Flags : S = sticky ECMP supported; R = missing hardware resources
-------------------------------------------------------------------------------
===============================================================================

TOR-2, TOR-4, and TOR-5 have a BGP route for anycast prefix 10.10.0.0/24 in the route table for VPRN-10 with default preference 170 and these TORs generate an EVPN IFL route for this anycast prefix. BL-1 receives the following three IP prefix routes for prefix 10.0.0.0/24:

[/]
A:admin@BL-1# show router bgp routes evpn ip-prefix prefix 10.10.0.0/24
===============================================================================
 BGP Router ID:192.0.2.1        AS:64500       Local AS:64500
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN IP-Prefix Routes
===============================================================================
Flag  Route Dist.         Prefix
      Tag                 Gw Address
                          NextHop
                          Label
                          ESI
-------------------------------------------------------------------------------
u*>i  192.0.2.2:10        10.10.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.2
                          LABEL 524283
                          00:00:00:00:00:23:23:23:10:00    # EVPN IP aliasing

u*>i  192.0.2.4:10        10.10.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.4
                          LABEL 524281
                          ESI-0

u*>i  192.0.2.5:10        10.10.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.5
                          LABEL 524283
                          ESI-0

-------------------------------------------------------------------------------
Routes : 3
===============================================================================

The detailed information of these IP prefix routes includes the Sticky flag for the route with next-hop 192.0.2.2, which is the peer that is configured with the import policy to add stickiness:

[/]
A:admin@BL-1# show router bgp routes evpn ip-prefix prefix 10.10.0.0/24 hunt
===============================================================================
 BGP Router ID:192.0.2.1        AS:64500       Local AS:64500
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN IP-Prefix Routes
===============================================================================
-------------------------------------------------------------------------------
RIB In Entries
-------------------------------------------------------------------------------
Network        : n/a
Nexthop        : 192.0.2.2
Path Id        : None
From           : 192.0.2.2
Res. Nexthop   : 192.168.12.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-2
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:10 evpn-bandwidth:1:1 bgp-tunnel-encap:MPLS
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.2
Origin         : IGP
Flags          : Used Valid Best Sticky
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : 00:00:00:00:00:23:23:23:10:00
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.10.0.0/24
Route Dist.    : 192.0.2.2:10
MPLS Label     : LABEL 524283
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h04m46s

-------------------------------------------------------------------------------

Network        : n/a
Nexthop        : 192.0.2.4
Path Id        : None
From           : 192.0.2.4
Res. Nexthop   : 192.168.14.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-4
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:10 evpn-bandwidth:1:3 bgp-tunnel-encap:MPLS
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.4
Origin         : IGP
Flags          : Used Valid Best
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : ESI-0
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.10.0.0/24
Route Dist.    : 192.0.2.4:10
MPLS Label     : LABEL 524281
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h04m30s

-------------------------------------------------------------------------------

Network        : n/a
Nexthop        : 192.0.2.5
Path Id        : None
From           : 192.0.2.5
Res. Nexthop   : 192.168.15.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-5
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:10 evpn-bandwidth:1:1 bgp-tunnel-encap:MPLS
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.5
Origin         : IGP
Flags          : Used Valid Best
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : ESI-0
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.10.0.0/24
Route Dist.    : 192.0.2.5:10
MPLS Label     : LABEL 524283
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h04m21s

-------------------------------------------------------------------------------
RIB Out Entries
-------------------------------------------------------------------------------
---snip---

Sticky ECMP for EVPN IFL over SRv6

Example topology - IFL EVPN-SRv6 shows the topology with VPRN-20 in an EVPN SRv6 network.

Figure 4. Example topology - IFL EVPN-SRv6

Configuration

The initial configuration includes:
  • cards, MDAs, ports
  • router interfaces
  • IS-IS on the router interfaces of BL-1 and the TORs, except for the router interfaces between TORs and CNFs
  • SRv6 on BL-1 and the TORs
  • IBGP on BL-1 and the TORs with BL-1 acting as RR

On BL-1, the policy "add-stickiness-vprn-20" can be applied at BGP peer level or at service level. In the following configuration, the policy is applied as vrf-import policy in VPRN-20:

# on BL-1:
configure {
    policy-options {
        community "comm-20" {
            member "target:64500:20" { }
        }
        prefix-list "cnf_ips-20" {
            prefix 10.20.0.0/24 type longer {
            }
        }
        policy-statement "AS-20" {
            entry 1 {
                action {
                    action-type accept
                    as-path-prepend {
                        as-path 64503
                    }
                    community {
                        add ["comm-20"]
                    }
                }
            }
        }
        policy-statement "add-stickiness-vprn-20" {
            entry 12 {
                from {
                    prefix-list ["cnf_ips-20"]
                    community {
                        name "comm-20"
                    }
                }
                action {
                    action-type accept
                    sticky-ecmp true
                }
            }
            entry 13 {
                from {
                    community {
                        name "comm-20"
                    }
                }
                action {
                    action-type accept
                }
            }
        }
    }
    service {
        vprn "VPRN-20" {
            admin-state enable
            description "IFL-SRv6"
            service-id 20
            customer "1"
            ecmp 10
            segment-routing-v6 1 {
                locator "BL1-loc" {
                    function {
                        end-dt4 {
                        }
                        end-dt6 {
                        }
                        end-dt46 {
                        }
                    }
                }
            }
            bgp-evpn {
                segment-routing-v6 1 {
                    admin-state enable
                    route-distinguisher "192.0.2.1:20"
                    source-address 2001:db8::2:1
                    evi 20
                    vrf-target {
                        community "target:64500:20"
                    }
                    vrf-import {
                        policy ["add-stickiness-vprn-20"]
                    }
                    vrf-export {
                        policy ["AS-20"]
                    }
                    srv6 {
                        instance 1
                        default-locator "BL1-loc"
                    }
                    evpn-link-bandwidth {
                        weighted-ecmp true
                        advertise {
                        }
                    }
                }
            }
            interface "test-20" {
                ipv4 {
                    primary {
                        address 172.20.20.1
                        prefix-length 30
                    }
                }
                sap 1/1/c10/1:20 {
                }
            }

The configuration of VPRN-20 on the TORs is similar, but without the stickiness. IP aliasing ECMP is implemented on TOR-2 and TOR-3 in a similar way as in Sticky ECMP for EVPN IFL over MPLS. The configuration of the CNFs is also similar.

Verification

On BL-1, the routes for prefix 10.20.0.0 have sticky ECMP enabled for all next-hops, as follows:

[/]
A:admin@BL-1# show router service-name "VPRN-20" route-table 10.20.0.0/24

===============================================================================
Route Table (Service: 20)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.20.0.0/24   [S]                            Remote  EVPN-IFL  00h03m58s  170
       2001:db8:aaaa:102:7b1d:b000:: (tunneled:SRV6)                10
10.20.0.0/24   [S]                            Remote  EVPN-IFL  00h03m58s  170
       2001:db8:aaaa:103:7b1d:9000:: (tunneled:SRV6)                10
10.20.0.0/24   [S]                            Remote  EVPN-IFL  00h03m58s  170
       2001:db8:aaaa:104:7b1d:9000:: (tunneled:SRV6)                10
10.20.0.0/24   [S]                            Remote  EVPN-IFL  00h03m58s  170
       2001:db8:aaaa:105:7b1d:b000:: (tunneled:SRV6)                10
-------------------------------------------------------------------------------
No. of Routes: 4
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================

The extensive route table also shows the stickiness, which applies for all the next-hops regardless of the ECMP weights.

[/]
A:admin@BL-1# show router service-name "VPRN-20" route-table 10.20.0.0 extensive

===============================================================================
Route Table (Service: 20)
===============================================================================
Dest Prefix             : 10.20.0.0/24
  Protocol              : EVPN-IFL
  Age                   : 00h02m05s
  Preference            : 170
  Sticky ECMP           : Yes
  Indirect Next-Hop     : 192.0.2.2
    SRV6 SID            : 2001:db8:aaaa:102:7b1d:b000::
    VPN Next-Hop Index  : 33
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 2001:db8:aaaa:102:7b1d:b000:: (SRV6 tunnel)
      Metric            : 10
      ECMP-Weight       : 1
  Indirect Next-Hop     : 192.0.2.3
    SRV6 SID            : 2001:db8:aaaa:103:7b1d:9000::
    VPN Next-Hop Index  : 35
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 2001:db8:aaaa:103:7b1d:9000:: (SRV6 tunnel)
      Metric            : 10
      ECMP-Weight       : 1
  Indirect Next-Hop     : 192.0.2.4
    SRV6 SID            : 2001:db8:aaaa:104:7b1d:9000::
    VPN Next-Hop Index  : 37
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 3
    Resolving Next-Hop  : 2001:db8:aaaa:104:7b1d:9000:: (SRV6 tunnel)
      Metric            : 10
      ECMP-Weight       : 3
  Indirect Next-Hop     : 192.0.2.5
    SRV6 SID            : 2001:db8:aaaa:105:7b1d:b000::
    VPN Next-Hop Index  : 38
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    ECMP-Weight         : 1
    Resolving Next-Hop  : 2001:db8:aaaa:105:7b1d:b000:: (SRV6 tunnel)
      Metric            : 10
      ECMP-Weight       : 1
-------------------------------------------------------------------------------
No. of Destinations: 1
===============================================================================

BL-1 uses the following EVPN IP prefix routes:

[/]
A:admin@BL-1# show router bgp routes evpn ip-prefix prefix 10.20.0.0/24
===============================================================================
 BGP Router ID:192.0.2.1        AS:64500       Local AS:64500
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN IP-Prefix Routes
===============================================================================
Flag  Route Dist.         Prefix
      Tag                 Gw Address
                          NextHop
                          Label
                          ESI
-------------------------------------------------------------------------------
u*>i  192.0.2.2:20        10.20.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.2
                          504283
                          00:00:00:00:00:23:23:23:20:00    # IP aliasing ECMP

u*>i  192.0.2.4:20        10.20.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.4
                          504281
                          ESI-0

u*>i  192.0.2.5:20        10.20.0.0/24
      0                   00:00:00:00:00:00
                          192.0.2.5
                          504283
                          ESI-0

-------------------------------------------------------------------------------
Routes : 3
===============================================================================

The detailed information of these IP prefix routes includes the Sticky flag for all next-hops, as follows:

[/]
A:admin@BL-1# show router bgp routes evpn ip-prefix prefix 10.20.0.0/24 hunt
===============================================================================
 BGP Router ID:192.0.2.1        AS:64500       Local AS:64500
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN IP-Prefix Routes
===============================================================================
-------------------------------------------------------------------------------
RIB In Entries
-------------------------------------------------------------------------------
Network        : n/a
Nexthop        : 192.0.2.2
Path Id        : None
From           : 192.0.2.2
Res. Nexthop   : 192.168.12.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-2
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:20 evpn-bandwidth:1:1
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.2
Origin         : IGP
Flags          : Used Valid Best Sticky
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : 00:00:00:00:00:23:23:23:20:00
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.20.0.0/24
Route Dist.    : 192.0.2.2:20
MPLS Label     : 504283
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h01m13s
SRv6 TLV Type  : SRv6 L3 Service TLV (5)
SRv6 SubTLV    : SRv6 SID Information (1)
Sid            : 2001:db8:aaaa:102::
Full Sid       : 2001:db8:aaaa:102:7b1d:b000::
Behavior       : End.DT4 (19)
SRv6 SubSubTLV : SRv6 SID Structure (1)
Loc-Block-Len  : 48                     Loc-Node-Len   : 16
Func-Len       : 20                     Arg-Len        : 0
Tpose-Len      : 20                     Tpose-offset   : 64

-------------------------------------------------------------------------------

Network        : n/a
Nexthop        : 192.0.2.4
Path Id        : None
From           : 192.0.2.4
Res. Nexthop   : 192.168.14.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-4
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:20 evpn-bandwidth:1:3
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.4
Origin         : IGP
Flags          : Used Valid Best Sticky
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : ESI-0
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.20.0.0/24
Route Dist.    : 192.0.2.4:20
MPLS Label     : 504281
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h00m58s
SRv6 TLV Type  : SRv6 L3 Service TLV (5)
SRv6 SubTLV    : SRv6 SID Information (1)
Sid            : 2001:db8:aaaa:104::
Full Sid       : 2001:db8:aaaa:104:7b1d:9000::
Behavior       : End.DT4 (19)
SRv6 SubSubTLV : SRv6 SID Structure (1)
Loc-Block-Len  : 48                     Loc-Node-Len   : 16
Func-Len       : 20                     Arg-Len        : 0
Tpose-Len      : 20                     Tpose-offset   : 64

-------------------------------------------------------------------------------

Network        : n/a
Nexthop        : 192.0.2.5
Path Id        : None
From           : 192.0.2.5
Res. Nexthop   : 192.168.15.2
Local Pref.    : 100                    Interface Name : int-BL-1-TOR-5
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   IGP Cost       : 10
Connector      : None
Community      : target:64500:20 evpn-bandwidth:1:1
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.0.2.5
Origin         : IGP
Flags          : Used Valid Best Sticky
Route Source   : Internal
AS-Path        : 64501
EVPN type      : IP-PREFIX
ESI            : ESI-0
Tag            : 0
Gateway Address: 00:00:00:00:00:00
Prefix         : 10.20.0.0/24
Route Dist.    : 192.0.2.5:20
MPLS Label     : 504283
Route Tag      : 0
Neighbor-AS    : 64501
DB Orig Val    : N/A                    Final Orig Val : N/A
Source Class   : 0                      Dest Class     : 0
Add Paths Send : Default
Last Modified  : 00h00m45s
SRv6 TLV Type  : SRv6 L3 Service TLV (5)
SRv6 SubTLV    : SRv6 SID Information (1)
Sid            : 2001:db8:aaaa:105::
Full Sid       : 2001:db8:aaaa:105:7b1d:b000::
Behavior       : End.DT4 (19)
SRv6 SubSubTLV : SRv6 SID Structure (1)
Loc-Block-Len  : 48                     Loc-Node-Len   : 16
Func-Len       : 20                     Arg-Len        : 0
Tpose-Len      : 20                     Tpose-offset   : 64

-------------------------------------------------------------------------------
RIB Out Entries
-------------------------------------------------------------------------------
---snip---

Sticky ECMP for EVPN IFF over VXLAN

Example topology - IFF EVPN-VXLAN shows the topology with R-VPLS BD-31 linked to VPRN-30 in an EVPN-VXLAN network.

Figure 5. Example topology - IFF EVPN-VXLAN

Configuration

The initial configuration includes:
  • cards, MDAs, ports
  • router interfaces
  • IS-IS on the router interfaces of BL-1 and the TORs, except for the router interfaces between TORs and CNFs
  • IBGP on BL-1 and the TORs with BL-1 acting as RR

On BL-1, the import policy adds ECMP stickiness to R-VPLS BD-31. This import policy is applied as vsi-import in R-VPLS BD-31, but it can also be applied at BGP peer level. The configuration on BL-1 is as follows:

# on BL-1:
configure {
    policy-options {
        community "comm-31" {
            member "target:64500:31" { }
        }
        prefix-list "cnf_ips-30" {
            prefix 10.30.0.0/24 type longer {
            }
        }
        policy-statement "import-add-stickiness-rvpls-31" {
            entry 10 {
                from {
                    prefix-list ["cnf_ips-30"]
                    community {
                        name "comm-31"
                    }
                }
                action {
                    action-type accept
                    sticky-ecmp true
                }
            }
            entry 11 {
                from {
                    community {
                        name "comm-31"
                    }
                }
                action {
                    action-type accept
                }
            }
        }
    }
    service {
        vpls "BD-31" {
            admin-state enable
            description "broadcast domain 31 connected to VPRN-30"
            service-id 31
            customer "1"
            vxlan {
                instance 1 {
                    vni 31
                }
            }
            routed-vpls {
            }
            bgp 1 {
                vsi-import ["import-add-stickiness-rvpls-31"]
            }
            bgp-evpn {
                evi 31
                routes {
                    ip-prefix {
                        advertise true
                        link-bandwidth {
                            weighted-ecmp true
                            advertise {
                            }
                        }
                    }
                }
                vxlan 1 {
                    admin-state enable
                    vxlan-instance 1
                }
            }
        }
        vprn "VPRN-30" {
            admin-state enable
            service-id 30
            customer "1"
            ecmp 10
            interface "int-BD-31" {
                vpls "BD-31" {
                    evpn-tunnel {
                    }
                }
            }
            interface "test-30" {
                loopback true
                ipv4 {
                    primary {
                        address 172.20.30.1
                        prefix-length 30
                    }
                }
            }
        }

The configuration on TOR-2 is as follows. The EVI in the ES for IP aliasing ECMP corresponds to the R-VPLS BD-31.

# on TOR-2:
configure {
    policy-options {
        policy-statement "export-to-bgp" {
            description "export to BGP" # export from any protocol (here: EVPN IFF) to BGP
            entry 10 {
                to {
                    protocol {
                        name [bgp]
                    }
                }
                action {
                    action-type accept
                }
            }
            info
        }
    }
    service {
        system {
            bgp {
                evpn {
                    ethernet-segment "ES-31" {
                        admin-state enable
                        type virtual
                        esi 00:00:00:00:00:23:23:23:31:00
                        multi-homing-mode all-active
                        association {
                            vprn-next-hop 10.100.30.1 {
                                virtual-ranges {
                                    evi 31 { }
                                }
                            }
                        }
                    }
                }
            }
        }
        vpls "BD-31" {
            admin-state enable
            service-id 31
            customer "1"
            vxlan {
                instance 1 {
                    vni 31
                }
            }
            routed-vpls {
            }
            bgp 1 {
            }
            bgp-evpn {
                evi 31
                routes {
                    ip-prefix {
                        advertise true
                        link-bandwidth {
                            weighted-ecmp true
                            advertise {
                            }
                        }
                    }
                }
                vxlan 1 {
                    admin-state enable
                    vxlan-instance 1
                    mh-mode network
                    routes {
                        auto-disc {
                            advertise true
                        }
                    }
                }
            }
        }
        vprn "VPRN-30" {
            admin-state enable
            service-id 30
            customer "1"
            autonomous-system 64500
            ecmp 10
            bgp {
                preference 168      # must be preferred over EVPN IFF routes (169)
                router-id 10.100.30.2
                rapid-withdrawal true
                ebgp-default-reject-policy {
                    import false
                }
                group "PE-CE" {
                    type external
                    peer-as 64501
                    export {
                        policy ["export-to-bgp"]
                    }
                }
                neighbor "10.100.30.1" {
                    group "PE-CE"
                    evpn-link-bandwidth {
                        add-to-received-bgp 1
                    }
                }
            }
            interface "int-BD-31" {
                vpls "BD-31" {
                    evpn-tunnel {
                    }
                }
            }
            interface "int-VPRN30-TOR-2-to-CNF-6" {
                ipv4 {
                    primary {
                        address 10.30.26.1
                        prefix-length 24
                    }
                }
                sap 1/1/c3/1:30 {
                }
            }
            interface "loopback" {
                loopback true
                ipv4 {
                    primary {
                        address 10.100.30.2
                        prefix-length 32
                    }
                }
            }
            static-routes {
                route 10.100.30.1/32 route-type unicast {
                    next-hop "10.30.26.2" {
                        admin-state enable
                    }
                }
            }
        }

The configuration on the other TORs is similar, but the interface addresses are used instead of the IP alias.

EVPN IFF routes have a default preference of 169, whereas EVPN IFL routes have a default preference of 170. The TORs with EBGP sessions to the CNFs receive the EBGP route for the anycast prefix with default BGP preference 170 and they advertise an EVPN IFF route for the anycast prefix with preference 169. When the TORs receive EVPN IFF routes for the anycast prefix with preference 169, the EVPN IFF routes have preference over the EBGP route. The TORs install the EVPN IFF route in the route table for VPRN-30 and they withdraw their own generated EVPN IFF route, so BL-1 will not have EVPN IFF routes to each TOR. To prefer the EBGP route over the EVPN IFF routes, the preference of the EBGP routes is configured with a value lower than 169.

Verification

TOR-2, TOR-4, and TOR-5 install the prefix 10.30.0.0 in VPRN-30 with the configured preference 168; on TOR-2 as follows:

[/]
A:admin@TOR-2# show router 30 route-table 10.30.0.0 

===============================================================================
Route Table (Service: 30)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric   
-------------------------------------------------------------------------------
10.30.0.0/24                                  Remote  BGP       00h01m14s  168
       10.30.26.2                                                   1
-------------------------------------------------------------------------------
No. of Routes: 1

The BGP route for prefix 10.30.0.0/24 is preferred over the EVPN IFF routes for the same anycast prefix, so TOR-2, TOR-4, and TOR-5 each generate an EVPN IFF route for prefix 10.30.0.0/24. BL-1 receives the following three IP prefix routes for prefix 10.30.0.0/24:

[/]
A:admin@BL-1# show router bgp routes evpn ip-prefix prefix 10.30.0.0/24
===============================================================================
 BGP Router ID:192.0.2.1        AS:64500       Local AS:64500
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN IP-Prefix Routes
===============================================================================
Flag  Route Dist.         Prefix
      Tag                 Gw Address
                          NextHop
                          Label
                          ESI
-------------------------------------------------------------------------------
u*>i  192.0.2.2:31        10.30.0.0/24
      0                   00:02:fe:ff:ff:5c
                          192.0.2.2
                          VNI 31
                          00:00:00:00:00:23:23:23:31:00

u*>i  192.0.2.4:31        10.30.0.0/24
      0                   00:04:fe:ff:ff:5c
                          192.0.2.4
                          VNI 31
                          ESI-0

u*>i  192.0.2.5:31        10.30.0.0/24
      0                   00:05:fe:ff:ff:5c
                          192.0.2.5
                          VNI 31
                          ESI-0

-------------------------------------------------------------------------------
Routes : 3
===============================================================================

On BL-1, the EVPN-IFF routes for prefix 10.30.0.0 in VPRN-30 have stickiness for all next-hops, as follows:

[/]
A:admin@BL-1# show router service-name "VPRN-30" route-table 10.30.0.0

===============================================================================
Route Table (Service: 30)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric
-------------------------------------------------------------------------------
10.30.0.0/24   [S]                            Remote  EVPN-IFF  00h00m26s  169
       int-BD-31 (ET-00:02:fe:ff:ff:5c)                             0
10.30.0.0/24   [S]                            Remote  EVPN-IFF  00h00m26s  169
       int-BD-31 (ET-00:03:fe:ff:ff:5c)                             0
10.30.0.0/24   [S]                            Remote  EVPN-IFF  00h00m26s  169
       int-BD-31 (ET-00:04:fe:ff:ff:5c)                             0
10.30.0.0/24   [S]                            Remote  EVPN-IFF  00h00m26s  169
       int-BD-31 (ET-00:05:fe:ff:ff:5c)                             0
-------------------------------------------------------------------------------
No. of Routes: 4
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================
[/]
A:admin@BL-1# show router service-name "VPRN-30" route-table 10.30.0.0 extensive

===============================================================================
Route Table (Service: 30)
===============================================================================
Dest Prefix             : 10.30.0.0/24
  Protocol              : EVPN-IFF
  Age                   : 00h00m58s
  Preference            : 169
  Sticky ECMP           : Yes
  Next-Hop              : int-BD-31 (ET-00:02:fe:ff:ff:5c)
    Interface           : int-BD-31
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    Metric              : 0
    ECMP-Weight         : 1
  Next-Hop              : int-BD-31 (ET-00:03:fe:ff:ff:5c)
    Interface           : int-BD-31
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    Metric              : 0
    ECMP-Weight         : 1
  Next-Hop              : int-BD-31 (ET-00:04:fe:ff:ff:5c)
    Interface           : int-BD-31
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    Metric              : 0
    ECMP-Weight         : 3
  Next-Hop              : int-BD-31 (ET-00:05:fe:ff:ff:5c)
    Interface           : int-BD-31
    QoS                 : Priority=n/c, FC=n/c
    Source-Class        : 0
    Dest-Class          : 0
    Metric              : 0
    ECMP-Weight         : 1
-------------------------------------------------------------------------------
No. of Destinations: 1
===============================================================================

Conclusion

When EVPN IP prefix routes advertise an additional route with a new next-hop for the same prefix or when an EVPN IP prefix is withdrawn for that prefix, the number of paths changes and therefore, the flow distribution changes. Upon withdrawal of one of the next-hops, sticky ECMP redistributes only the affected flows. When adding a next-hop, sticky ECMP minimizes the impact on existing flows. This way, the number of TCP resets is limited. The stickiness is solely associated with next-hops and not with links at LAG level.

Appendix

The sticky ECMP implementation is based on software. The ECMP behavior is emulated by repeating each ECMP next-hop of the sticky route a number of times, depending on the next-hop normalized weight, in different hashing buckets. The assignment of hashing buckets is not based on the number of existing next-hops for a router, but on the maximum number of internal hashing buckets, which is 64 in FP-based platforms and 16 in IXR platforms.

Note: The closer the number of next-hops to the maximum number of ECMP paths (64 for SR OS), the worse the distribution algorithm works. (In the example, only three next-hops are used for 64 ECMP paths.)

Sticky ECMP flow distribution when one next-hop is removed for 10.10.0.0/24 compares the initial sticky ECMP distribution with three next-hops with the sticky ECMP flow distribution when next-hop 3 is removed as in Redistributed traffic flows after CNF-10 is removed. Next-hop 2 has weight 3 while next-hops 1 and 3 have weight 1.

Table 1. Sticky ECMP flow distribution when one next-hop is removed for 10.10.0.0/24
Initial sticky ECMP distribution with next-hop 1 (weight 1), next-hop 2 (weight 3), and next-hop 3 (weight 1) Sticky ECMP distribution after next-hop 3 fails
bucket next-hop bucket next-hop
00 1 00 1
01 2 01 2
02 2 02 2
03 2 03 2
04 3 04 1
05 1 05 1
06 2 06 2
07 2 07 2
08 2 08 2
09 3 09 2
10 1 10 1
11 2 11 2
12 2 12 2
13 2 13 2
14 3 14 2
15 1 15 1
16 2 16 2
17 2 17 2
18 2 18 2
19 3 19 2
20 1 20 1
21 2 21 2
22 2 22 2
23 2 23 2
24 3 24 1
25 1 25 1
26 2 26 2
27 2 27 2
28 2 28 2
29 3 29 2
30 1 30 1
31 2 31 2
32 2 32 2
33 2 33 2
34 3 34 2
35 1 35 1
36 2 36 2
37 2 37 2
38 2 38 2
39 3 39 2
40 1 40 1
41 2 41 2
42 2 42 2
43 2 43 2
44 3 44 1
45 1 45 1
46 2 46 2
47 2 47 2
48 2 48 2
49 3 49 2
50 1 50 1
51 2 51 2
52 2 52 2
53 2 53 2
54 3 54 2
55 1 55 1
56 2 56 2
57 2 57 2
58 2 58 2
59 3 59 2
60 1 60 1
61 2 61 2
62 2 62 2
63 2 63 2

All existing flows with next-hops 1 (TOR-2) or 2 (TOR-4) remain unchanged; only the flows with next-hop 3 (TOR-5) are redistributed over the remaining paths according to the weighted ECMP set.

Similarly, when the initial ECMP distribution has two next-hops (next-hop 1 with weight 1 and next-hop 2 with weight 3) and a third next-hop (next-hop 3 with weight 1) is added, the stickiness ensures that only 20% of the flows is redistributed, as shown in Sticky ECMP flow distribution when one next-hop is added for 10.10.0.0/24. The initial situation is different from the preceding table.

Table 2. Sticky ECMP flow distribution when one next-hop is added for 10.10.0.0/24
Initial sticky ECMP distribution with next-hop (weight 1) and next-hop 2 (weight 3) Sticky ECMP distribution after next-hop 3 (weight 1) is added
bucket next-hop bucket next-hop
00 1 00 1
01 2 01 2
02 2 02 2
03 2 03 2
04 1 04 3
05 2 05 2
06 2 06 2
07 2 07 2
08 1 08 1
09 2 09 3
10 2 10 2
11 2 11 2
12 1 12 1
13 2 13 2
14 2 14 3
15 2 15 2
16 1 16 1
17 2 17 2
18 2 18 2
19 2 19 3
20 1 20 1
21 2 21 2
22 2 22 2
23 2 23 2
24 1 24 3
25 2 25 2
26 2 26 2
27 2 27 2
28 1 28 1
29 2 29 3
30 2 30 2
31 2 31 2
32 1 32 1
33 2 33 2
34 2 34 3
35 2 35 2
36 1 36 1
37 2 37 2
38 2 38 2
39 2 39 3
40 1 40 1
41 2 41 2
42 2 42 2
43 2 43 2
44 1 44 3
45 2 45 2
46 2 46 2
47 2 47 2
48 1 48 1
49 2 49 3
50 2 50 2
51 2 51 2
52 1 52 1
53 2 53 2
54 2 54 3
55 2 55 2
56 1 56 1
57 2 57 2
58 2 58 2
59 2 59 3
60 1 60 1
61 2 61 2
62 2 62 2
63 2 63 2
Note: With sticky ECMP, the distribution over the hashing buckets is not deterministic. The initial distribution is the result of a number of changes (added next-hops or deleted next-hops) that happened beforehand and sticky ECMP keeps as many flows as possible.