Configuring event handler for operational groups

The key to preventing traffic from being black-holed is to not allow it to be forwarded to a leaf that has no active uplinks; for example, by disabling access links as soon as uplinks become operationally disabled.

The following sections provide an example of configuring event handler for an operational group (oper-group):

Configuring the event handler instance

To configure an event handler instance for the oper-group feature:

  1. Define a set of uplinks to monitor in the paths statement.

  2. Specify downlinks (access links) and other parameters in the options statement.

  3. Provide the name of a MicroPython script in the upython-script statement.

Defining uplinks to monitor in the paths statement

In the paths statement of an event handler instance used with an oper-group, you define the set of uplinks that are necessary to provide service for a set of downlinks. The oper-group feature works by monitoring the operational state of the uplinks, and uses the state information to determine whether the operational state of the access links must be changed.

In this example, the operational state of two uplink interfaces ethernet-1/49 and ethernet-1/50 on Leaf 1 are being monitored. If the operational state of the uplink interfaces changes to down, the oper-group feature changes the state of the access interface to down to avoid black-holing of traffic from the client.

Figure 1. Disabling downlink based on uplink state

To monitor the operational state for a set of interfaces, configure the paths statement in an event handler instance. For example:

--{ candidate shared default }--[  ]--
# info system event-handler instance opergroup path
    system {
        event-handler {
            instance opergroup {
                paths [
                    "interface ethernet-1/{49..50} oper-state"
                ]
            }
        }
    }

Specify the contents of the paths statement in SR Linux CLI format. In the example above, the paths statement is equivalent to the following CLI command:

--{ running }--[  ]--
# info from state interface ethernet-1/{49..50} oper-state

Specifying downlinks and other parameters in the options statement

The options statement in the event handler instance allows you to define objects that are passed to the script to be used as input parameters.

For an oper-group configuration, you can use the options statement to indicate the relationship between the monitored uplinks and the access links.

In the following example, the options define objects that specify the following:

  • The access links that react to state changes of the uplinks
  • The number of operationally up uplinks required for the access links to stay up
--{ candidate shared default }--[  ]--
# info system event-handler instance opergroup options
    system {
        event-handler {
            instance opergroup {
                options {
                    object down-links {
                        values [
                            ethernet-1/1
                        ]
                    }
                    object required-up-uplinks {
                        value 1
                    }
                    object debug {
                        value true
                    }
                }
            }
        }
    }

In this example, the down-links object specifies an interface name. When the object is passed to the script, it can be used as a parameter indicating the access link associated with the uplinks. For this oper-group configuration, the down-links object indicates the interface for which the operational state depends on the state of the uplinks defined in the paths statement.

The required-up-uplinks object specifies the number of uplinks that need to be operationally up before the access link is brought down. For this oper-group configuration, the value is 1, which means that at least one uplink must be up. The script calculates the number of uplinks that are operationally up, and compares that number to the value in the required-up-uplinks object.

The debug object is set to true, which directs the script to print the values of certain script variables.

Specifying script name in the upython-script statement

The upython-script statement in the event handler instance specifies the name of the MicroPython script to be invoked when SR Linux detects a change in the interfaces defined in the paths statement.

The MicroPython script must reside in one of the following locations:

  • /etc/opt/srlinux/eventmgr for user-provided scripts
  • /opt/srlinux/eventmgr for Nokia-provided scripts
--{ candidate shared default }--[  ]--
# info system event-handler instance opergroup upython-script
    system {
        event-handler {
            instance opergroup {
                upython-script oper-group.py
            }
        }
    }

For this oper-group configuration, whenever a change occurs to the oper-state of the interfaces defined in the paths statement, event handler invokes the oper-group.py script.

Event handler oper-group configuration

When administratively enabled, the full configuration for this event handler instance looks like the following:

--{ candidate shared default }--[  ]--
# info system event-handler instance oper-group
     system {
         event-handler {
             instance oper-group {
                 admin-state enable
                 upython-script oper-group.py
                 paths [
                     "interface ethernet-1/{1,2} oper-state"
                 ]
                 options {
                     object down-links {
                         values [
                             ethernet-1/{20,22}
                         ]
                     }
                     object hold-down-time {
                         value 5000
                     }
                     object required-up-uplinks {
                         value 2
                     }
                 }
             }
         }
     }

MicroPython script for oper-group

When there is a state change in any of the paths defined in the paths statement of the event handler instance, the script defined in the upython-script statement is invoked. Event handler calls the function event_handler_main() in the script, passing it a JSON string indicating the current state of the monitored paths, as well as the object:value pairs defined in the options statement.

The script receives this input, processes it, and returns a list of actions.

Script input

For the example in Event handler oper-group configuration, the input JSON string consists of the current state of the two uplinks and the provided options. The following JSON string is passed to the oper-group.py script if the operational state of interface ethernet-1/49 changes to down:

{
    "paths": [
        {
            "path": "interface ethernet-1/49 oper-state",
            "value": "down"
        },
        {
            "path": "interface ethernet-1/50 oper-state",
            "value": "up"
        }
    ],
    "options": {
        "debug": "true",
        "required-up-uplinks": "1",
        "down-links": [
            "ethernet-1/1"
        ]
    }
}

Script processing

The following is the oper-group.py script referenced in the event handler instance.

import sys
import json

def count_up_uplinks(paths):
    up_cnt = 0
    for path in paths:
        if path.get('value','down') == 'up':
            up_cnt = up_cnt+1
    return up_cnt

def required_up_uplinks(options):
    return int(options.get('required-up-uplinks', '1'))

def hold_time(options):
    return int(options.get('hold-down-time', '0'))

def bool_to_oper_state(val):
    return ('down','up')[bool(val)]

def event_handler_main(in_json_str):
    in_json = json.loads(in_json_str)
    paths = in_json['paths']
    options = in_json['options']
    persist = in_json.get('persistent-data', {})

    num_up_uplinks = count_up_uplinks(paths)
    downlink_should_be_up = required_up_uplinks(options) <= num_up_uplinks
    needs_hold_down = False

    # down->up transition will be held for optional hold-time
    if (hold_time(options) > 0) and downlink_should_be_up:
        needs_hold_down = persist.get("last-state", "up") == "down"

    if options.get("debug") == "true":
        print(
            f"hold down time = {hold_time(options)}ms\n\
num of required up uplinks = {required_up_uplinks(options)}\n\
detected num of up uplinks = {num_up_uplinks}\n\
downlinks new state = {bool_to_oper_state(downlink_should_be_up)}\n\
needs_hold_down = {str(needs_hold_down)}"
        )

    response_actions = []

    oper_state_str = bool_to_oper_state(not needs_hold_down and downlink_should_be_up)
    for downlink in options.get('down-links'):
        response_actions.append({'set-ephemeral-path' : {'path':'interface {0} oper-state'.format(downlink),'value':oper_state_str}})

    if needs_hold_down:
        response_actions.append({'reinvoke-with-delay' : hold_time(options)})
    response_persistent_data = {'last-state':bool_to_oper_state(downlink_should_be_up)}

    response = {'actions':response_actions,'persistent-data':response_persistent_data}
    return json.dumps(response)

The following sections describe how each part the script processes the input for this oper-group example.

Parsing input JSON

Starting with the event_handler_main function, the incoming JSON string is parsed and the relevant portions are extracted.

def event_handler_main(in_json_str):
    in_json = json.loads(in_json_str)
    paths = in_json['paths']
    options = in_json['options']
    persist = in_json.get('persistent-data', {})

The paths and options are objects defined in the incoming JSON string, and they are saved in their respective like-named variables.

Populating the debug log

The debug option causes the script variables to appear in the debug log.

    if options.get("debug") == "true":
        print(
            f"hold down time = {hold_time(options)}ms\n\
num of required up uplinks = {required_up_uplinks(options)}\n\
detected num of up uplinks = {num_up_uplinks}\n\
downlinks new state = {bool_to_oper_state(downlink_should_be_up)}\n\
needs_hold_down = {str(needs_hold_down)}"
        )

The debug log is present only if the debug option is set to "true" in the event handler instance configuration.

You can display the debug log by using the following CLI command:

--{ running }--[  ]--
# info from state system event-handler instance opergroup last-stdout-stderr

Composing output

At this point, the script is able to define the correct state for the downlinks, based on the state of the monitored uplinks and the required number of healthy uplinks. For the event handler to take action, the script needs to output a JSON string following the format defined in Actions.

    response_actions = []

    oper_state_str = bool_to_oper_state(not needs_hold_down and downlink_should_be_up)
    for downlink in options.get('down-links'):
        response_actions.append({'set-ephemeral-path' : {'path':'interface {0} oper-state'.format(downlink),'value':oper_state_str}})

    if needs_hold_down:
        response_actions.append({'reinvoke-with-delay' : hold_time(options)})
    response_persistent_data = {'last-state':bool_to_oper_state(downlink_should_be_up)}

    response = {'actions':response_actions,'persistent-data':response_persistent_data}
    return json.dumps(response)

This example shows an output JSON string, using the calculated downlinks_new_state and the list of downlinks provided from the down-links option.

The output JSON string contains the set-ephemeral-path action, which sets the oper-state of the downlink to the correct value (up or down).

The output is provided via the response dictionary, and is JSON-encoded before returning from the function. This routine provides a JSON string back to the event handler, which processes and executes the actions passed to it.

The result of this processing shows the implementation of the oper-group feature: the event handler executes actions to set the state of a downlink based on the state of a group of uplinks.

Displaying oper-group information

When an event handler instance is configured and administratively enabled, an initial sync of the monitored paths state is performed. As a result of this initial sync, event handler immediately attempts to execute a script when it receives the state for the monitored paths.

You can display the status of an event handler instance by querying the state datastore. For example:

# /info from state system event-handler instance opergrp
    system {
        event-handler {
            instance opergrp {
                admin-state enable
                upython-script oper-group.py
                oper-state up
                paths [
                    "interface ethernet-1/1 oper-state"
                    "interface ethernet-1/4 oper-state"
                ]
                options {
                    object down-links {
                        values [
                            ethernet-1/3
                            ethernet-1/8
                        ]
                    }
                    object required-num-up-links {
                        value 2
                    }
                }
                last-execution {
                    start-time now
                    end-time now
                    upython-duration 1
                    input "{\"paths\":[{\"path\":\"interface ethernet-1/1 oper-state\",\"value\":\"up\"},{\"path\":\"interface ethernet-1/4 oper-state\",\"value\":\"up\"}],\"options\":{\"down-links\":[\"ethernet-1/3\",\"ethernet-1/8\"],\"required-num-up-links\":\"2\"},\"persistent-data\":{\"last-state\":\"up\"}}"
                    output "{\"actions\": [{\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/3 oper-state\", \"value\": \"up\"}}, {\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/8 oper-state\", \"value\": \"up\"}}], \"persistent-data\": {\"last-state\": \"up\"}}"
                    stdout-stderr ""
                }
                last-errored-execution {
                    oper-down-reason admin-disabled
                    oper-down-reason-detail ""
                    start-time "26 seconds ago"
                    end-time "25 seconds ago"
                    upython-duration 0
                    input "{\"paths\":[{\"path\":\"interface ethernet-1/1 oper-state\",\"value\":\"up\"},{\"path\":\"interface ethernet-1/4 oper-state\",\"value\":\"down\"}],\"options\":{\"down-links\":[\"ethernet-1/3\",\"ethernet-1/8\"],\"required-num-up-links\":\"2\"},\"persistent-data\":{\"last-state\":\"down\"}}"
                    output "{\"actions\": [{\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/3 oper-state\", \"value\": \"up\"}}, {\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/8 oper-state\", \"value\": \"up\"}}], \"persistent-data\": {\"last-state\": \"up\"}}"
                    stdout-stderr ""
                }
                statistics {
                    upython-duration 516
                    execution-count 1643
                    execution-successes 1642
                    execution-errors 1
                }
            }
        }
    }

This command displays the following information:

  • oper-state

    The operational state of the event handler instance. In case of any errors in the script and, or configuration the state is down.

  • last-execution

    Information about the most recent time the script was executed.

  • last-errored-execution

    Information about the last time the script was executed with an error result. This includes the oper-down-reason, oper-down-reason-detail the input and output JSON strings, and the output print statements and log messages sent to stdout-stderr by the script.

  • statistics

    Statistics related to the execution process.

For the oper-group example, the following output is displayed if one of the uplinks goes down:

--{ running }--[  ]--
# info from state system event-handler instance opergrp
    system {
        event-handler {
            instance opergrp {
                admin-state enable
                upython-script oper-group.py
                oper-state up
                last-execution {
                    start-time now
                    end-time now
                    upython-duration 1
                    input "{\"paths\":[{\"path\":\"interface ethernet-1/1 oper-state\",\"value\":\"up\"},{\"path\":\"interface ethernet-1/4 oper-state\",\"value\":\"up\"}],\"options\":{\"down-links\":[\"ethernet-1/3\",\"ethernet-1/8\"],\"required-num-up-links\":\"2\"},\"persistent-data\":{\"last-state\":\"up\"}}"
                    output "{\"actions\": [{\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/3 oper-state\", \"value\": \"up\"}}, {\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/8 oper-state\", \"value\": \"up\"}}], \"persistent-data\": {\"last-state\": \"up\"}}"
                    stdout-stderr ""num of required up uplinks = 1
detected num of up uplinks = 1
downlinks new state = up"
                }
            }
        }
    }

The setting for downlinks new state is up because the detected num of up uplinks did not drop below the required number of 1. The downlink interface therefore remains operationally up.

If both of the uplinks go down, the following is displayed:

--{ running }--[  ]--
# info from state system event-handler instance opergrp
# info from state system event-handler instance opergrp
    system {
        event-handler {
            instance opergrp {
                admin-state enable
                upython-script oper-group.py
                oper-state up
                last-execution {
                    start-time now
                    end-time now
                    upython-duration 1
                    input "{\"paths\":[{\"path\":\"interface ethernet-1/1 oper-state\",\"value\":\"up\"},{\"path\":\"interface ethernet-1/4 oper-state\",\"value\":\"up\"}],\"options\":{\"down-links\":[\"ethernet-1/3\",\"ethernet-1/8\"],\"required-num-up-links\":\"2\"},\"persistent-data\":{\"last-state\":\"up\"}}"
                    output "{\"actions\": [{\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/3 oper-state\", \"value\": \"up\"}}, {\"set-ephemeral-path\": {\"path\": \"interface ethernet-1/8 oper-state\", \"value\": \"up\"}}], \"persistent-data\": {\"last-state\": \"up\"}}"
                    stdout-stderr ""num of required up uplinks = 1
detected num of up uplinks = 0
downlinks new state = down"
                }
            }
        }
    }

The detected num of up uplinks is 0, which is below the required number of 1. This causes event handler to set the downlink interface to operationally down.

In this way, event handler uses the oper-group feature to disable the access link when the uplink interfaces go down, therefore preventing traffic from black-holing.