IP ECMP Load Balancing

Equal-Cost Multipath Protocol (ECMP) refers to the distribution of packets over two or more outgoing links that share the same routing cost. Static, IS-IS, OSPF, and BGP routes to IPv4 and IPv6 destinations can be programmed into the datapath by their respective applications, with multiple IP ECMP next hops.

The SR Linux device load-balances traffic over multiple equal-cost links with a hashing algorithm that uses header fields from incoming packets to calculate which link to use. When an IPv4 or IPv6 packet is received on a subinterface, and it matches a route with a number of IP ECMP next hops, the next hop that forwards the packet is selected based on a computation using this hashing algorithm. The goal of the hash computation is to keep packets in the same flow on the same network path, while distributing traffic proportionally across the ECMP next hops, so that each of the N ECMP next hops carries approximately 1/Nth of the load.

The hash computation takes various key and packet header field values as inputs and returns a value that indicates the next hop. The key and field values that can be used by the hash computation depend on the platform, packet type, and configuration options, as follows:

On 7250 IXR platforms, the following can be used in the hash computation:

  • Hash-seed (0 to 65535)

    On 7250 IXR-6, 7250 IXR-10, 7250 IXR-X1b, and 7250 IXR-X3b devices, the hash-seed can be system-generated (the default) or user-specified. If the hash-seed is system-generated, SR Linux generates a hash-seed using the least-significant 16 bits of the base chassis MAC address.

    On 7250 IXR-6e, 7250 IXR-10e, and 7250 IXR-18e devices with 36 x 800 IMM, the system randomly generates a per-interface hash-seed using the chassis base MAC address and the 64-bit port ID as inputs; this ensures that the hash-seed is the same after every restart of the port, LAG, or IMM.

  • On Gen 3 linecards on 7250 IXR-6e, 7250 IXR-10e, and 7250 IXR-18e devices, a hash-seed can be specified within a hash-profile. When the hash-profile is applied to an interface, the specified hash-seed takes effect for the interface.
  • For IPv4 TCP/UDP non-fragmented packets: source IPv4 address, destination IPv4 address, IP protocol, Layer 4 source port, Layer 4 destination port. The algorithm is asymmetric; that is, inverting source and destination pairs does not produce the same result.
  • For IPv6 TCP/UDP non-fragmented packets: source IPv6 address, destination IPv6 address, IPv6 flow label (even if it is 0), IP protocol (IPv6 next-header value in the last extension header), Layer 4 source port, Layer 4 destination port. The algorithm is symmetric; that is, inverting source and destination pairs produces the same result.
  • For all other packets: source IPv4 or IPv6 address, destination IPv4 or IPv6 address.

On 7250 IXR, 7220 IXR-H4, 7220 IXR-H5, 7250 IXR-X1b, and 7250 IXR-X3b devices, if an IP packet being forwarded has a UDP destination port of 4791, indicating it is carrying an RDMA over Converged Ethernet version 2 (ROCEv2) payload, the 24-bit Dest Queue-pair value in the ROCEv2 header (BTH+) is added to the hash algorithm. In this case, hashing is based on the existing 5-tuple flow and the new Dest Queue-pair value. The system determines an IP packet's Dest Queue-pair value based on the format of the BTH+ header by looking for the 24-bit value that is 5-bytes offset from the end of the UDP header. On 7220 IXR-H4 and 7220 IXR-H5 devices, the UDF mechanism is used to match on qualifying packets and extract the Dest Queue-pair value from the specified offset.

On the 7220 IXR-D1, 7220 IXR-D2, 7220 IXR-D3, 7220 IXR-H2, and 7220 IXR-H3 the following can be used in the hash computation:

  • Hash-seed (0 to 65535), which can be system-generated (the default) or user-specified. If the hash-seed is system-generated SR Linux generates a hash-seed using the least-significant 16 bits of the base chassis MAC address.
  • For IPv4 TCP/UDP non-fragmented packets: VLAN ID, source IPv4 address, destination IPv4 address, IP protocol, Layer 4 source port, Layer 4 destination port. The algorithm is asymmetric.
  • For IPv6 TCP/UDP non-fragmented packets: VLAN ID, source IPv6 address, destination IPv6 address, IPv6 flow label (even if it is 0), IP protocol (IPv6 next-header value in the last extension header), Layer 4 source port, Layer 4 destination port.
  • For all other packets: source IPv4 or IPv6 address, destination IPv4 or IPv6 address.

On 7215 IXS platforms, the following can be used in the hash computation:

  • Source IP address
  • Destination IP address
  • Layer 4 source port
  • Layer 4 destination port
  • Hash seed
  • IPv6 flow label
  • Received MPLS labels (terminated and non-terminated)
  • IP protocol number

Avoiding hash polarization

Hash polarization is when the hash algorithm selects ECMP next-hops inefficiently; for example, when the system always chooses the same next-hop for specific packet flows. Hash polarization can occur when adjacent routers use the same hash-seed.

To avoid hash polarization effects, ensure that directly connected nodes have unique hash-seeds. You can do this by explicitly configuring the hash-seeds, or by verifying that the state value of system-generated hash-seeds is different on adjacent routers.

To check the state value of system-generated hash-seeds, use the info from state command.

The following example displays the system-wide hash-seed (either user-configured or system-generated) on 7220 IXR and 7250 IXR platforms:

--{ + running }--[  ]--
# info with-context from state system load-balancing hash-options
    system {
        load-balancing {
            hash-options {
                hash-seed 2203
            }
        }
    }

The following example displays the system-generated, interface-specific hash-seed on 7250 IXR-6e, 7250 IXR-10e, and 7250 IXR-18e platforms:

--{ + running }--[  ]--
# info with-context from state interface ethernet-1/1 load-balancing hash-seed
    interface ethernet-1/1 {
        load-balancing {
            hash-seed 41521
        }
    }

Checking hash polynomials (7250 IXR platforms)

On 7250 IXR platforms, the system computes a set of load-balancing keys for each received packet. Packets that belong to the same 5-tuple flow have the same load-balancing keys, ensuring they follow the same path through the network and do not get misordered.

For each received packet, the system computes load-balancing keys for the following clients or "hash-users":

  • key 1 is used to select an ECMP level 1 FEC member
  • key 2 is used to select an ECMP level 2 FEC member
  • key 3 is used to select an ECMP level 3 FEC member
  • key 4 is used to select a LAG member
  • key 5 is used to generate a value to be stamped into a network header at egress (for example, IPv6 flow label or VXLAN UDP source port)

To create the load-balancing keys, the system sends a master key (computed from the combined CRC values of each of the packet header layers), along with the user-configured or system-generated hash-seed, to one of eight polynomial functions available on the device. Each hash-user is assigned one of the eight polynomial functions. The system uses the master key and the hash-seed as input to the polynomial function, which returns the load-balancing key for the hash-user as output.

There is a greater risk of hash polarization if two adjacent routers use the same polynomial function for the same hash-user, instead of two different polynomial functions. You can use an info from state command to display the polynomial function assigned to the each hash-user; for example:

--{ + running }--[  ]--
# info with-context from state platform linecard 1 forwarding-complex 1 load-balancing hash-user * hash-polynomial
    platform {
        linecard 1 {
            forwarding-complex 1 {
                load-balancing {
                    hash-user level-1-fec {
                        hash-polynomial 1
                    }
                    hash-user level-2-fec {
                        hash-polynomial 2
                    }
                    hash-user level-3-fec {
                        hash-polynomial 3
                    }
                    hash-user lag {
                        hash-polynomial 4
                    }
                    hash-user network-header {
                        hash-polynomial 5
                    }
                }
            }
        }
    }

If adjacent routers use the same hash-polynomial function for the same hash-user, you can avoid potential hash polarization by changing the hash-seed on one of the routers.

Configuring IP ECMP load balancing

To configure IP ECMP load balancing, you specify hash-options that are used as input fields for the hash calculation, which determines the next hop for packets matching routes with multiple ECMP hops.

Configure hash options for IP ECMP load balancing

The following example configures hash options for IP ECMP load balancing, including a hash-seed and packet header field values to be used in the hash computation.

--{ * candidate shared default }--[  ]--
# info with-context system load-balancing
    system {
        load-balancing {
            hash-options {
                hash-seed 128
                ipv6-flow-label false
            }
        }
    }

On 7250 IXR-6, 7250 IXR-10, 7250 IXR-X1b, and 7250 IXR-X3b devices, if no value is configured for the hash-seed, the default is for the system to generate a hash-seed using the least-significant 16 bits of the base chassis MAC address. If a hash-option is not specifically configured either true or false, the default for the hash option is true. The user-configured or system-generated hash-seed applies system-wide.

On 7250 IXR-6e, 7250 IXR-10e, and 7250 IXR-18e devices, SR Linux randomly generates a per-interface hash-seed using the chassis base MAC address and the 64-bit port ID as inputs.

On 7250 IXR devices, if source-address is configured as a hash option, the destination-address must also be configured as a hash option. Similarly, if source-port is configured as a hash option, the destination-port must also be configured as a hash option.

Configure IP ECMP load balancing based only on IPv4 source and destination address (7250 IXR-X1b only)

The following example configures the hash options so that the SR Linux device load-balances IPv4 traffic using only the IPv4 source and destination address. In this example, fields such as Layer 4 protocol and Layer 4 source/destination ports are not used in the load-balancing calculation. The result of this configuration is that all IPv4 traffic with same source address and destination address pair is forwarded to the same next-hop and on the same port always.

--{ * candidate shared default }--[  ]--
# info with-context system load-balancing hash-options
    system {
        load-balancing {
            hash-options {
                destination-address true
                destination-port false
                ipv6-flow-label false
                protocol false
                source-address true
                source-port false
                mpls-label-stack false
            }
        }
    }

Configure a hash-profile and apply it to an interface

On 7250 IXR-6e, 7250 IXR-10e, and 7250 IXR-18e devices (Gen 3 linecards only), you can specify a hash-seed in a hash-profile and apply the hash-profile to an interface. The interface uses the hash-seed configured in the hash profile.

The following example configures a hash-seed in a hash-profile, then applies the hash-profile to an interface.

--{ * candidate shared default }--[  ]--
# info with-context system load-balancing hash-profile p1
    system {
        load-balancing {
            hash-profile p1 {
                hash-seed 123
            }
        }
    }
--{ * candidate shared default }--[  ]--
# info with-context interface ethernet-1/1 load-balancing hash-profile
    interface ethernet-1/1 {
        load-balancing {
            hash-profile p1
        }
    }

Resilient ECMP hashing

Note: Resilient ECMP hashing is supported on the following devices: 7250 IXR-6, 7250 IXR-10, 7250 IXR-6e, 7250 IXR-10e, 7250 IXR-18e, 7250 IXR-X1b, 7250 IXR-X3b, 7220 IXR-D1, 7220 IXR-D2, and 7220 IXR-D3.

For some IP prefixes with IP ECMP next-hops, it may be advantageous to move as few flows as possible when removing or adding members to the ECMP set; for example, when the ECMP next-hops of an IP route correspond to network appliances or host servers that maintain state for the flows they service, and moving flows requires state to be rebuilt. To do this, SR Linux supports resilient hashing, which can minimize hash bucket reassignment during changes to the ECMP set.

Because resilient hashing consumes extra datapath resources, no IP ECMP routes are programmed for resilient hashing by default. To enable resilient hashing for an IP ECMP route, the prefix of the route must be matched by an entry configured in a resilient-hash-prefix list. A resilient-hash-prefix list entry specifies two parameters: the number of hash buckets per path (hash-buckets-per-path value) and the maximum number of paths (max-paths value). Together, these parameters determine a hash-bucket fill pattern for the matching prefix. The hash-buckets-per-path is the number of times each next-hop is repeated in the fill pattern if there are max-paths ECMP next-hops

For example, if the hash-buckets-per-path value is 4 and the max-paths value is 6, then the fill pattern has 24 hash buckets, offering space for 6 next-hops, each repeated 4 times. A route matching this resilient-hash-prefix list entry is initially programmed with 4 next-hops. The initial programming of its next-hop-group looks like this:

[nh1, nh2, nh3, nh4, (nh1), (nh1)],

[nh1, nh2, nh3, nh4, (nh2), (nh2)],

[nh1, nh2, nh3, nh4, (nh3), (nh3)],

[nh1, nh2, nh3, nh4, (nh4), (nh4)]

Next-hops in parentheses represent repeated next-hops that are leaving space for future next-hop additions.

If a new ECMP next-hop nh5 is added to the route, then the fill pattern becomes the following:

[nh1, nh2, nh3, nh4, nh5, (nh1)],

[nh1, nh2, nh3, nh4, nh5, (nh2)],

[nh1, nh2, nh3, nh4, nh5, (nh3)],

[nh1, nh2, nh3, nh4, nh5, (nh4)]

If instead of adding an ECMP next-hop, the existing ECMP next-hop nh4 is deleted, then the fill pattern becomes the following:

[nh1, nh2, nh3, (nh1), (nh1), (nh1)],

[nh1, nh2, nh3, (nh2), (nh2), (nh2)],

[nh1, nh2, nh3, (nh3), (nh3), (nh3)],

[nh1, nh2, nh3, (nh1), (nh2), (nh3)]

The resilient hashing algorithm does not guarantee that the traffic share of each ECMP member is equally affected by a next-hop addition or removal. The example above shows that when the route goes from 3 to 4 ECMP next-hops, the existing nh1 next-hop loses two hash buckets (out of 24), while the existing nh2 and nh3 next-hops lose only one hash bucket. The resilient hashing algorithm is focused on minimizing hash bucket reassignment throughout the changes.

On both 7250 IXR and 7220 IXR systems, the product of the hash-buckets-per-path value and max-paths value must be less-than or equal-to a platform-defined limit of 128.

Note: If a route is covered by a resilient-hash-prefix list entry, but it has only a single next-hop (non ECMP), the resilient flag displays as true in the route-table state, even though multiple hash buckets are not allocated.

Resilient hashing interaction with weighted ECMP

When a BGP, IS-IS, or gRIBI route is eligible for both weighted ECMP and resilient hashing, the resilient hashing configuration overrides the weighted ECMP configuration. For example, if max-ecmp-hash-buckets-per-next-hop-group is 256 and the resilent-hash-prefix configuration for the prefix specifies a max-paths value of 4 and hash-buckets-per-path of 4, then 16 hash buckets are used for this prefix and the number of ECMP members is capped at 4.

Resilient hashing interaction with tunnels and ILM entries

Incoming Label Mapping (ILM) and tunnel programming are not affected when a tunnel destination is matched by a resilient-hash-prefix list entry; more specifically, the ILM entry or tunnel is not programmed for resilient ECMP hashing even if it is matched by a resilient-hash-prefix list entry.

This applies to the following tunnel types:

  • SR-ISIS
  • uncolored SR policy
  • BGP-LU
  • VXLAN
  • GRE

This applies to the following ILM types:

  • LDP
  • SR-ISIS
  • BGP-LU

Configuring resilient hashing

To configure an IP ECMP route for resilient hashing, you configure the route prefix in a resilient-hash-prefix list and specify the number of hash buckets per path and the maximum number of ECMP next-hop paths per route.

The following example configures an entry in a resilient-hash-prefix list. Active routes in the FIB that exactly match this prefix or are longer matches of this prefix are provided with resilient-hash programming. The hash-buckets-per-path value is 4 and the max-paths value is 6, so the fill pattern has 24 hash buckets, offering space for 6 next-hops, each repeated 4 times.

--{ +* candidate shared default }--[  ]--
# info with-context network-instance default ip-load-balancing resilient-hash-prefix *
    network-instance default {
        ip-load-balancing {
            resilient-hash-prefix 10.10.10.0/24 {
                hash-buckets-per-path 4
                max-paths 6
            }
        }
    }

Dynamic Load Balancing

7220 IXR-Hx platforms support dynamic load balancing for ECMP distribution of packets over outgoing links. Dynamic load balancing improves on hash-based load balancing by considering the state of aggregate ECMP group members when assigning flows to groups.

At packet ingress, a flow is identified, and the state of the flow is evaluated. Based on this evaluation, the flow is assigned to an aggregate ECMP group. The dynamic load balancing algorithm analyzes the aggregate ECMP groups and detects ECMP load imbalances among the egress paths based on three load-balancing factors: egress port utilization, port queue fill size, and Ingress Traffic Manager (ITM) port queue size. When the algorithm detects an ECMP load imbalance, it can reassign flows to different aggregate ECMP groups so that balance is restored.

Dynamic load balancing is enabled for specific prefixes in a network-instance. The following options are configurable at the system level, which applies them to all dynamic load balancing ECMP groups.

  • flowset-size: the number of flow entries reserved for each aggregate ECMP group.
  • inactivity-timer: the amount of time a flow must be idle before it is eligible for reassignment to a different aggregate ECMP group member.
  • mode: whether the system can reassign a flow to a different aggregate ECMP group (flow-dynamic) after the initial assignment when the flow is inactive, whether the flows do not move after the initial assignment (flow-fixed), or whether the load balancing is done on a per-packet basis, distributing them across all active next-hops.
  • link-quality-sampling-interval: how often the system samples the link quality.

Dynamic load balancing is supported for physical interfaces only; it is not supported for LAG interfaces. Dynamic load balancing is limited to unicast traffic only.

Configuring dynamic load balancing

You can configure options to adjust how the dynamic load balancing algorithm balances traffic, the network prefixes for which traffic is load balanced, and thresholds for monitoring resources used by dynamic load balancing.

Configure system options for dynamic load balancing

The following example configures system-level parameters to adjust the dynamic load balancing algorithm.

--{ +* candidate shared default }--[  ]--
# info with-context system load-balancing dynamic
    system {
        load-balancing {
            dynamic {
                flowset-size 512
                inactivity-timer 100
                link-quality-sampling-interval 7
                weighting-factor {
                    port-utilization 70
                    queue-utilization 20
                    itm-utilization 10
                }
            }
        }
    }

In this example, a flow must be inactive for a minimum of 100 microseconds to be moved to a better quality interface. The quality of the link is evaluated at an interval of 7 microseconds. The dynamic load balancing algorithm is configured so that port-utilization is weighted heaviest.

Configure the mode for dynamic load balancing

By default, dynamic load balancing assigns a flow to a different aggregate ECMP group if the flow is inactive following the initial assignment. You can optionally configure the mode for dynamic load balancing to either per-packet or flow-dynamic. In either mode, the system considers the port load quality and chooses the best quality path. In per-packet mode each packet is assigned to a best quality path. In flow-dynamic mode, the entire flow is assigned to a best quality path.

The following example sets the dynamic load balancing mode to per-packet.

--{ * candidate shared default }--[  ]--
# info with-context system load-balancing dynamic
    system {
        load-balancing {
            dynamic {
                mode per-packet
            }
        }
    }

Configure network prefixes for dynamic load balancing

Dynamic load balancing is enabled on an IP prefix-by-prefix basis within a network-instance. Routes matching the specified prefix have dynamic load balancing enabled on their associated ECMP next-hop group.

The following example enables dynamic load balancing for a prefix within the default network-instance.

--{ +* candidate shared default }--[  ]--
# info with-context network-instance default ip-load-balancing dynamic-load-balancing
    network-instance default {
        ip-load-balancing {
            dynamic-load-balancing {
                prefix 10.101.12.0/24 {
                }
            }
        }
    }

Configure resource monitoring thresholds for dynamic load balancing

The following example configures the system to generate a warning message when the usage level for dynamic load balancing ECMP groups exceeds a threshold, and a notice message when the usage level for dynamic load balancing ECMP groups drops below a threshold.

--{ +* candidate shared default }--[  ]--
# info with-context platform resource-monitoring datapath asic resource dynamic-load-balancing-ecmp-groups
    platform {
        resource-monitoring {
            datapath {
                asic {
                    resource dynamic-load-balancing-ecmp-groups {
                        upper-threshold-set 75
                        upper-threshold-clear 50
                    }
                }
            }
        }
    }

In this example, a warning message is generated and used-upper-threshold-exceeded of the datapath resource is set to true whenever utilization of the resource in any line card, forwarding complex, or pipeline reaches 75% in a rising direction. A notice message is generated and used-upper-threshold-exceeded is set to false whenever utilization of the datapath resource in any line card, forwarding complex, or pipeline reaches 50% in a falling direction.

Display resource usage for dynamic load balancing ECMP groups

The following example shows the current resource usage for dynamic load balancing ECMP groups in the system:

--{ running }--[  ]--
# info with-context from state platform linecard 1 forwarding-complex 0 datapath asic resource dynamic-load-balancing-ecmp-groups
    platform {
        linecard 1 {
            forwarding-complex 0 {
                datapath {
                    asic {
                        resource dynamic-load-balancing-ecmp-groups {
                            used-percent 100
                            used-entries 128
                            free-entries 0
                        }
                    }
                }
            }
        }
    }

Display the port quality for the ECMP path

The following example displays the current port quality metrics of the ECMP path. In this example, two metrics are displayed for each ITM. Single ITM platforms display a metric value of 0.

--{ running }--[  ]--
# info with-context from state interface ethernet-1/1 load-balancing
    interface ethernet-1/1 {
        load-balancing {
            last-dynamic-load-balancing-quality-metrics [
                7
                7
            ]
        }
    }

Verify that dynamic load balancing is enabled for a prefix

It is possible that dynamic load balancing may become disabled for a prefix because of exhaustion of resources such as flowset size or dynamic load balancing ECMP group limit. The following example displays whether dynamic load balancing is enabled for a prefix. In the example, dynamic-load-balancing true indicates the prefix is enabled for dynamic load balancing.

--{ running }--[  ]--
# info from state network-instance default route-table ipv4-unicast route 192.2.1.0/24 id 0 route-type static route-owner static_route_mgr origin-network-instance default
    leakable false
    leaked false
    metric 1
    preference 5
    active true
    last-app-update "2025-10-22T19:23:39.036Z (25 minutes ago)"
    next-hop-group 6181644
    next-hop-group-network-instance default
    resilient-hash false
    dynamic-load-balancing true
    fib-programming {
        suppressed false
        last-successful-operation-type add
        last-successful-operation-timestamp "2025-10-22T19:23:39.065Z (25 minutes ago)"
        pending-operation-type none
        last-failed-operation-type none
    }

In the following example, requested true indicates dynamic load balancing was configured for the prefix in the network-instance, and enabled true indicates that the prefix is programmed for dynamic load balancing. The flows-rebalanced statistic indicates the number of flows that were rebalanced from the original ECMP path assignment.

--{ + running }--[  ]--
# info from state network-instance default route-table next-hop-group 6181644
    backup-next-hop-group 0
    backup-active false
    dynamic-load-balancing {
        requested true
        enabled true
        flows-rebalanced 1290184