Operational groups overview

A potential use for the event handler framework is for operational groups (oper-group for short). The oper-group feature creates a relationship between logical elements of a network node so that they become aware of each other, forming a logical group.

For example, an oper-group can involve a set of downlink ports whose operational state, either up or down, should be configured depending on the operational state of a set of uplink ports. The two sets of ports constitute the oper-group. Event handler can manage the relationship between the sets of ports in the oper-group, changing the operational state of the ports as necessary.

The oper-group feature can address the issue of traffic black-holing when leaves lose all connectivity to the spine layer. Consider the following simplified Clos topology where clients are multi-homed to leaves:

Figure 1. Clos topology with multi-homed clients

With EVPN all-active multihoming enabled in a fabric, traffic from Client 1 is load-balanced over the links attached to the upstream leaves and is propagated via the fabric to its destination.

Because all links of the client's bond interface are active, traffic is hashed to each of the constituent links and therefore uses all available bandwidth. A problem occurs when a leaf loses connectivity to all upstream spines, as illustrated below:

Figure 2. Traffic disruption when a leaf loses connectivity to upstream spines

When Leaf 1 loses its uplinks, traffic from Client 1 still flows to Leaf 1 because the client is not aware of any link loss problems happening on the leaf. This results in traffic black-holing on Leaf 1.

An oper-group can remedy this failure scenario by establishing a logical grouping between specific uplink and downlink interfaces on the leaves so that the operational state of the downlinks is tied to the state of the uplinks.

For this example, an oper-group is configured so that leaves shut down their downlink interfaces if they detect that the uplinks are down. This process is shown in the following diagram:

Figure 3. Using an oper-group to prevent traffic black-holing

In this example, the oper-group feature works as follows:

  1. When Leaf 1 loses its uplinks, the oper-group is notified and reacts by operationally disabling the access link toward the client.
  2. When Leaf 1's downlink transitions to down state, Client 1's bond interface stops using that interface for hashing, and traffic moves over to the healthy links. Client 1 stops sending to Leaf 1, and all traffic flows to Leaf 2.

Configuring event handler for operational groups shows how to configure the event handler framework to enable the oper-group feature.