System management

This chapter contains procedures for setting up basic system management functions on SR Linux, including the hostname, domain name, DNS settings, and management network-instance. It contains examples of configuring an SSH server and FTP server, as well as NTP for the system clock, and enabling an SNMP server.

Configuring a hostname

The SR Linux device must have a hostname configured. The default hostname is srlinux. The hostname normally appears on all CLI prompts on the device, although you can override this with the environment prompt CLI command.

The hostname should be a unique name on the network, and can be a fully qualified domain name (FQDN), or an unqualified single-label name. If the hostname is a single-label name (for example, srlinux), the system may use its domain name, if configured, to infer its own FQDN.

The following example shows the configuration for a hostname on the SR Linux device.

--{ candidate shared default }--[  ]--
# info system name
    system {
        name {
            host-name 3-node_srlinux-A
        }
    }

Configuring a domain name

The SR Linux device uses its hostname, combined with a domain name to form its fully qualified domain name (FQDN). It is expected that the FQDN exists within the DNS servers used by SR Linux, though this is not a requirement.

Assuming the SR Linux FQDN is in the DNS server, you can use the FQDN to reach the SR Linux device without knowing its management address. A domain name is not mandatory, but if specified, it is added to the DNS search list by default.

The following shows the configuration for a domain name on the SR Linux device. In this example, the device FQDN is set to 3-node_srlinux-A.mv.usa.nokia.com.

--{ candidate shared default }--[  ]--
# info system name
    system {
        name {
            host-name 3-node_srlinux-A
            domain-name mv.usa.nokia.com
        }
    }

Configuring DNS settings

The SR Linux device uses DNS to resolve hostnames within the configuration, or for operational commands, such as ping. You can specify up to three DNS servers for the SR Linux device to use, with either IPv4 or IPv6 addressing.

You can also specify a search list of DNS suffixes that the device can use to resolve single-label names; for example, for a search list of nokia1.com and nokia2.com, a ping for host srlinux does a DNS lookup for srlinux.nokia1.com, and if unsuccessful, does a DNS lookup for srlinux.nokia2.com.

The SR Linux device supports configuration of static DNS entries. Static DNS entries allow resolution of hostnames that may not be in the DNS servers used by the SR Linux device. Using a static DNS entry, you can map multiple addresses (both IPv4 and IPv6) to one hostname. The SR Linux linux_mgr application adds the static DNS entries to the /etc/hosts file in the underlying Linux OS.

In the following example, the SR Linux device is configured to use two DNS servers to resolve hostnames, a search list of DNS suffixes for resolving single-label names, and IPv4 and IPv6 static DNS entries for a host.

DNS requests are sourced from the mgmt network-instance (see Configuring the management network-instance).

--{ candidate shared default }--[  ]--
# info system dns
    system {
        dns {
            network-instance mgmt
            server-list [
                192.0.2.1
                192.0.2.2
            ]
            search-list [
                nokia1.com
                nokia2.com
            ]
            host-entry srlinux.nokia.com {
                ipv4-address 192.0.2.3
                ipv6-address 2001:db8:123:456::11:11
            }
        }
    }

Configuring the management network-instance

Management of the SR Linux device is primarily done via a management network-instance. The management network-instance isolates management traffic from other network-instances configured on the device.

The out-of-band mgmt0 interface is automatically added to the management network-instance, and management services run within the management network-instance.

Although the management network-instance is primarily intended to handle management traffic, you can configure it in the same way as any other network-instance on the device, including protocols, policies, and filters. The management network instance is part of the default configuration, but may be deleted if necessary.

Addressing within the management network-instance is available via DHCP and static IP addresses. Both IPv4 and IPv6 addresses are supported.

--{ candidate shared default }--[  ]--
# info network-instance mgmt
    network-instance mgmt {
        type ip-vrf
        admin-state enable
        description "Management network instance"
        interface mgmt0.0 {
        }
        protocols {
            linux {
                export-routes true
                export-neighbors true
            }
        }
    }

Access types

Access to the SR Linux device is available via a number of APIs and protocols. The SR Linux supports the following ways to access the device:

  • SSH – Secure Shell, a standard method for accessing network devices. See Enabling an SSH server.

  • FTP – File Transfer Protocol, a secure method for transferring files to and from network devices. See Configuring FTP.

  • Console – Access to the SR Linux CLI via direct connection to a serial port on the device.

  • gNMI – A gRPC-based protocol for the modification and retrieval of configuration from a target device, as well as the control and generation of telemetry streams from a target device to a data collection system. See gNMI server.

  • JSON-RPC – Ability to retrieve and set configuration and state using a JSON-RPC API. See JSON-RPC server.

  • SNMP – Simple Network Management Protocol, a commonly used network management protocol. The SR Linux device supports SNMPv2 with a limited set of OIDs.

Regardless of the method of access, all sessions are authenticated (if authentication is enabled), whether the session is entered via the console, SSH, or an API. Access to the device is controlled via the aaa_mgr application. See Securing access.

Enabling an SSH server

You can enable an SSH server for one or more network instances on the SR Linux device, so that users can log in to the CLI using an SSH client. The SR Linux device implements SSH via OpenSSH, and configures /etc/ssh/sshd_config in the underlying Linux OS. Only SSHv2 is supported.

In the following example, an SSH server is enabled in the mgmt and default network-instances, specifying the IP addresses where the device listens for SSH connections:

--{ candidate shared default }--[  ]--
# info system ssh-server
    system {
        ssh-server {
            network-instance mgmt {
                admin-state enable
                source-address [
                    192.0.2.1
                    192.0.2.2
                ]
            }
            network-instance default {
                admin-state enable
                source-address [
                    192.0.2.3
                    192.0.2.4
                ]
            }
        }
    }

Configuring SSH key-based authentication

The SR Linux SSH server supports RSA public-private key-based authentication, where an SSH client provides a signed message that has been encrypted by a private key. If the SSH client’s corresponding public key is configured on the SR Linux, the SSH server can authenticate the client.

When performing authentication for a user, the SR Linux first tries public-key authentication; if this fails, the SR Linux tries password authentication.

To configure SSH key-based authentication, you generate a public-private key pair, then add the public key to the SR Linux.

The following is an example of using the ssh-keygen utility in Linux to generate an RSA key pair with a length of 2,048 bits:

# ssh-keygen -t rsa -b 2048
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/id_rsa.
Your public key has been saved in /home/user/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:RNVV8/XRVK7PhY2OJxa7rjkSUFqyVoj4pUXL2PDs7mI user@linux
The key's randomart image is:
+---[RSA 2048]----+
| .o+o. ...oo*|
| o.oB*.. +=|
| . .o@* +|
| Fo = |
| . .M . + o|
| .. = o.|
| .. = o o|
| E.. .o + |
| . ...o+o |
+----[SHA256]-----+

After generating the RSA key pair, you can add the public key to the SR Linux. The location for the public key depends on the type of user for which SSH key-based authentication is being configured:

  • For Linux users (see Linux users), you add the public key to the user’s $HOME/.ssh/authorized_keys file.

  • For users configured within the SR Linux CLI (see Local users), you add the public key to the SR Linux configuration file. This can be done with a CLI command.

For example, the following CLI command configures a public key and password for the SR Linux user srlinux:

--{ candidate shared default }--[  ]--
# system aaa authentication user username srlinux ssh-key 
[ <public-key> ] password <password> 

In the example, the <public-key> has the format ssh-rsa <key> <comment>. If multiple public keys are configured for a user, they are tried in the order they were configured.

SSH host keys

An SSH host key is generated when an SSH server is enabled for a network instance on an SR Linux device. This host key is the public key.

SSH clients store the public host keys of the servers to which they are connected.

A host key that is stored is referred to as a Known host key. The files /etc/ssh/known_hosts and .ssh/known_hosts present in the user's home directory, contain the host public keys for all the known hosts.

In SR Linux, the server-side keys are stored in /etc/ssh, and the filenames begin with ssh_host_[dsa/ecdsa/ed25519/rsa]_key. For information about host key authentication and preservation, see Host key authentication and preservation.

Host key authentication and preservation

When an SSH client connects to a server, the server offers its host key as identification. If this is the first time the user has connected to the server, the client prompts the user to accept the host key. After the user accepts the key, the host key and the host-name used to connect to the server gets appended to the list of known hosts. In subsequent connections to the same server, the SSH client expects the server to return the same host key.

If the host key presented by the server on a subsequent connection is different from the one saved on the user’s local system, the SSH client refuses to proceed with the connection, and instead, displays the following warning message:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:4xnagTqYdQyLGJl0XhvAvyBcrRL2nZ8vSRXTYfcIYe0.
Please contact your system administrator.
Add correct host key in /Users/test_user/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /Users/test_user/.ssh/known_hosts:74
Host key for 10.0.0.1 has changed and you have requested strict checking.
Host key verification failed.

The message suggests that the server may be pretending to be the intended server to intercept passwords or there could be a change in the host key.

Without correct verification techniques, a periodic change in an SSH host key is indistinguishable from a man-in-the-middle (MITM) attack. Users may be tempted to accept frequent warning messages without verifying them, which can increase the vulnerability to MITM attacks instead of reducing it.

To overcome this, you can enable the preserve option for the SSH server host key. This ensures the SSH server host keys in /etc/ssh are preserved and restored after a system reboot or SSH server restart. When the preserve option is not enabled, the SSH server host key remains valid only until the node is restarted or the SSH server is stopped and restarted.

Note:

This feature is non-backward compatible in releases before 23.7.

For information about enabling the preserve option to prevent SSH host key regeneration upon reboot, see Preserving SSH host key.

Preserving SSH host key

To prevent SSH server host key regeneration upon reboot, set the preserve option for SSH under system ssh-server host-key.

The following example sets the preserve option for SSH:

--{ * candidate shared default }--[ ]--
# info system ssh-server host-key
    system {
        ssh-server {
            host-key {
                preserve true
            }
        }
    }
  • When preserve option is set to true, the SSH server host keys in /etc/ssh are saved and restored after a system reboot or SSH server restart.

  • When preserve option is set to false, the SSH server host keys in /etc/ssh are removed and regenerated on each system reboot or SSH server restart .

Configuring FTP

You can enable an FTP server for one or more network instances on the SR Linux device, so that users can transfer files to and from the device. The SR Linux uses the vsftpd (very secure FTP daemon) application within the underlying Linux OS. The authenticated user's home directory returned by the aaa_mgr application is set as the user's FTP root directory.

In the following example, the FTP server is enabled in the mgmt and default network-instance, specifying the IP addresses where the device listens for FTP connections:

--{ candidate shared default }--[  ]--
# info system ftp-server
    system {
        ftp-server {
            network-instance mgmt {
                admin-state enable
                source-address [
                    192.0.2.1 
                ]
            }
            network-instance default {
                admin-state enable
                source-address [
                   192.0.2.4 
                ]
            }
        }
    }

Configuring banners

You can specify banner text that appears when a user connects to the SR Linux device. The following banners can be configured:

  • Login banner – Displayed before a user has been authenticated by the system (for example, at the SSH login prompt)

  • Message of the day (motd) banner – Displayed after the user has been authenticated by the system

The banners appear regardless of the method used to connect to the SR Linux, so they are displayed to users connecting via SSH, console, and so on.

In the following example, login and motd banners are configured. The login banner text appears at the prompt when a user attempts to log in to the system, and the motd banner text appears after the user has been authenticated.

--{ candidate shared default }--[  ]--
# info system banner
    system {
        banner {
            login-banner "Enter your SRLinux login credentials."
            motd-banner "Welcome to the SRLinux CLI. Your activity may be monitored."
        }
    }

Synchronizing the system clock

Network Time Protocol (NTP) is used to synchronize the system clock to a time reference. You can configure NTP settings on the SR Linux device using the CLl, and the SR Linux linux_mgr application provisions the settings in the underlying Linux OS.

NTP does not account for time zones, instead relying on the host to perform such computations. Time zones on the SR Linux device are based on the IANA tz database, which is implemented by the underlying Linux OS. You can specify the time zone of the SR Linux device using the CLI.

The following configuration enables the system NTP client on the SR Linux device and specifies an NTP server to use for clock synchronization. The NTP client runs in the mgmt network-instance. The system time zone is set to America/Los_Angeles.

--{ candidate shared default }--[  ]--
# info system ntp
    system {
        ntp {
            admin-state enable
            network-instance mgmt
            server 4.53.160.75 {
            }
        }
        clock {
            timezone America/Los_Angeles
        }
    }

Configuring SNMP

To configure SNMP, enable an SNMP server on one or more network-instances. The SR Linux device supports SNMPv2. The MIB file that covers these OIDs is packaged with each release.

See the SR Linux System Management Guide for the procedure for configuring an SNMP server.

IP ECMP Load Balancing

Equal-Cost Multipath Protocol (ECMP) refers to the distribution of packets over two or more outgoing links that share the same routing cost. Static, IS-IS, OSPF, and BGP routes to IPv4 and IPv6 destinations can be programmed into the datapath by their respective applications, with multiple IP ECMP next-hops.

The SR Linux load-balances traffic over multiple equal-cost links with a hashing algorithm that uses header fields from incoming packets to calculate which link to use. When an IPv4 or IPv6 packet is received on a subinterface, and it matches a route with a number of IP ECMP next-hops, the next-hop that forwards the packet is selected based on a computation using this hashing algorithm. The goal of the hash computation is to keep packets in the same flow on the same network path, while distributing traffic proportionally across the ECMP next-hops, so that each of the N ECMP next-hops carries approximately 1/Nth of the load.

The hash computation takes various key and packet header field values as inputs and returns a value that indicates the next-hop. The key and field values that can be used by the hash computation depend on the platform, packet type, and configuration options, as follows:

On 7250 IXR systems, the following can be used in the hash computation:

  • For IPv4 TCP/UDP non-fragmented packets: user-configured hash-seed (0-65535; default 0), source IPv4 address, destination IPv4 address, IP protocol, L4 source port, L4 destination port. The algorithm is asymmetric; that is, inverting source and destination pairs does not produce the same result.
  • For IPv6 TCP/UDP non-fragmented packets: user-configured hash-seed (0-65535; default 0), source IPv6 address, destination IPv6 address, IPv6 flow label (even if it is 0), IP protocol (IPv6 next-header value in the last extension header), L4 source port, L4 destination port. The algorithm is symmetric; that is, inverting source and destination pairs produces the same result.
  • For all other packets: user-configured hash-seed (0-65535; default 0), source IPv4 or IPv6 address, destination IPv4 or IPv6 address.

On 7220 IXR-D1, D2, D3 and 7220 IXR-H2 and H3 systems, the following can be used in the hash computation:

  • For IPv4 TCP/UDP non-fragmented packets: VLAN ID, user-configured hash-seed (0-65535; default 0), source IPv4 address, destination IPv4 address, IP protocol, L4 source port, L4 destination port. The algorithm is asymmetric.
  • For IPv6 TCP/UDP non-fragmented packets: VLAN ID, user-configured hash-seed (0-65535; default 0), source IPv6 address, destination IPv6 address, IPv6 flow label (even if it is 0), IP protocol (IPv6 next-header value in the last extension header), L4 source port, L4 destination port.
  • For all other packets: user-configured hash-seed (0-65535; default 0), source IPv4 or IPv6 address, destination IPv4 or IPv6 address.

Configuring IP ECMP load balancing

To configure IP ECMP load balancing, you specify hash-options that are used as input fields for the hash calculation, which determines the next-hop for packets matching routes with multiple ECMP hops.

The following example configures hash options for IP ECMP load balancing, including a hash seed and packet header field values to be used in the hash computation.

--{ * candidate shared default }--[  ]--
# info system load-balancing
    system {
        load-balancing {
            hash-options {
                hash-seed 128
                ipv6-flow-label false
            }
        }
    }

If no value is configured for the hash-seed the default value is 0. If a hash-option is not specifically configured, the default is true.

On 7250 IXR systems, if source-address is configured as a hash option, the destination-address must also be configured as a hash option. Similarly, if source-port is configured as a hash option, the destination-port must also be configured as a hash option.

Configuring reboot options

You can perform a reboot using the command .tools.platform.<component>{.slot} in CLI. The reboot command is supported for all platform components, namely chassis, control (active/standby), fabric, and linecard slots.

This command triggers an immediate reboot.

  • Chassis: tools platform chassis reboot
  • Control slot: tools platform control <slot> reboot
  • Fabric slot: tools platform fabric <slot> reboot
  • Linecard slot: tools platform linecard <slot> reboot
You can use the following options with the reboot command. These options are platform specific.

delay

You can use the delay option to set a wait time before rebooting. The delay required is in seconds format. During this period, you can cancel any pending reboot operations.
Note: When a delayed reboot is pending for either chassis or active control component, setting a wait time to delay the reboot of any other component (linecard, standby control, fabric slot) is not supported.
  • Chassis: tools platform chassis reboot delay <value>
    
    # tools platform chassis reboot delay 2
    /platform:
     chassis will reboot in 2 seconds
    
  • Control slot: tools platform control <slot> reboot delay <value>
  • Fabric slot: tools platform fabric <slot> reboot delay <value>
  • Linecard slot: tools platform linecard <slot> reboot delay <value>

When doing a warm redundancy CPM switchover, any pending delayed reboot of active or standby control card is discarded. The command .tools.platform.redundancy.switchover is not impacted by any pending delayed reboots.

cancel

You can use the cancel option to cancel any pending reboot on a platform component. When there are no pending delayed reboots, the reboot cancel command execution fails.
  • Chassis: tools platform chassis reboot delay cancel
    
    # tools platform chassis reboot cancel
    The chassis reboot has been canceled
    /platform:
        chassis reboot has been canceled
  • Control slot: tools platform control <slot> reboot cancel
  • Fabric slot: tools platform fabric <slot> reboot cancel
  • Linecard slot: tools platform linecard <slot> reboot cancel
Note: You cannot combine the cancel option with force or delay options.

force

You can use the force option to trigger a force reboot of a platform component. This option is supported only for the chassis and control slots. The force option overrides all synchronization activities, and any soft checks that prevent a reboot (for example, the unsaved configuration between running and startup) are ignored.

Caution: Forcing a reboot immediately after an image change may result in a standby module booting an older image.
  • Chassis: tools platform chassis reboot force
  • Control slot: tools platform control <slot> reboot force

message

You can use the message option to broadcast a user-defined message to other users before the reboot occurs. The message option can be used with reboot types of immediate, cancel, or delay.

  • Chassis: tools platform chassis reboot message <value>
    
    #  tools platform chassis reboot message "This is a message" 
     chassis is rebooting now: "This is a message"
    
  • Control slot: tools platform control <slot> reboot message <value>
  • Fabric slot: tools platform fabric <slot> reboot message <value>
  • Linecard slot: tools platform linecard <slot> reboot message <value>
The following examples demonstrate different reboot scenarios with message option:
  • Immediate reboot with message option.
    
    # tools platform chassis reboot message "this is a message"
     chassis is rebooting now: "this is a message"
    A message indicating the initiation of an immediate reboot is broadcast to all users only if the immediate reboot command is executed with the message option.
  • Delayed reboot with message option.
    
    # tools platform control A reboot delay 100 message "this is a message"
     control slot A is rebooting in 100 seconds (at 2023-01-24T01:45:20.556Z): "this is a message"
    /platform/control[slot=A]:
        control slot A will reboot in 100 seconds
    A message indicating the reboot delay is broadcast to all users only if the reboot delay command is executed with the message option.
  • When a pending delayed reboot is executed for an active control card or the chassis, the user-defined message is broadcast, and the prompt is also updated accordingly.
    # tools platform control A reboot delay 3600 message "control reboot message"
    /platform/control[slot=A]:
     control slot A will reboot in 3600 seconds
     --{ [ACTIVE CONTROL REBOOT IN 3599 SECONDS (control reboot message)] running }--[ ]--
    
    
    # tools platform chassis reboot delay 3600 message "reboot message"
    /platform:
     chassis will reboot in 3600 seconds
     --{ [SYSTEM REBOOT IN 3599 SECONDS (reboot message)] running }--[ ]--
  • Canceling a delayed reboot with message option.

    
    #tools platform linecard 1 reboot cancel message "this is a cancel message"
     linecard slot 1 reboot has been canceled: "this is a cancel message"
    /platform/linecard[slot=1]:
     linecard slot 1 reboot has been canceled

    The above command cancels all the pending delayed reboots and broadcasts the user-defined message. A message indicating the reboot cancel is broadcast to all users only if the reboot cancel command is executed with the message option.

  • When a delayed reboot time expires and the reboot is about to happen, the following message is broadcast:
    
    chassis: chassis is rebooting now
    control: control slot <A or B> is rebooting now
    linecard: linecard slot <x> is rebooting now
    fabric: fabric slot <x> is rebooting now
    Note: This message is broadcast irrespective of executing the reboot delay command with or without the message option.
  • When a delayed reboot fails, the following message is broadcast:
    
    chassis: chassis reboot failed: "<reason for failure>"
    control: control slot <A or B> reboot failed: "<reason for failure>"
    linecard: linecard slot <x> reboot failed: "<reason for failure>"
    fabric: fabric slot <x> reboot failed: "<reason for failure>"
    
    Note: This message is broadcast irrespective of executing the reboot delay command with or without the message option.
  • When a delayed reboot is pending for a component, and reboot (delay or immediate) is attempted for a chassis or active control, the reboot gets rejected with the following message:

    
    chassis: disallowed, delayed reboot is pending for linecard slot 1
    active control slot A: disallowed, delayed reboot is pending for linecard slot 1
    This example shows reboot attempted on a linecard when control A is active.
    Note: The error message lists only the first found component with a delayed reboot pending. In this example, If all linecards <1..4> are pending delayed reboots, the error message highlights only the pending reboot for linecard1.
  • When a delayed reboot is pending for a chassis or active control, and reboot (delay or immediate) is attempted for any other platform component, the reboot gets rejected with the following error message.

    This example shows reboot attempted on a fabric slot when control A is active.
    fabric slot <x>: disallowed, delayed reboot is pending for control slot A
    This example shows reboot attempted on a fabric slot when reboot is pending for chassis component.
    fabric slot <x>: disallowed, delayed reboot is pending for chassis
    
  • While a delayed reboot is pending for a component, and reboot (delay or immediate) is attempted for the same platform component, the reboot gets rejected with a message indicating the pending delayed reboot must first be cancelled.
  • When the wait time specified for the reboot delay exceeds the maximum delay limit, the delayed reboot command gets executed and the following message is displayed:
    
    # tools platform linecard 1 reboot delay 18446744073709551615
    /platform/linecard[slot=1]:
     delay has been limited to 16772217903 seconds
     /platform/linecard[slot=1]:
     linecard slot 1 will reboot in 16772217903 seconds

warm

When you execute the reboot command with the warm option, it validates the current configuration and prompts reboot confirmation. On confirming, the system reboots without impacting the datapath. If a warm reboot is performed after a new image is configured, the system upgrades to the new image.

Before performing a warm reboot, you must confirm if the current SR Linux configuration and state supports warm reboot. Use the tools platform chassis reboot warm validate command.

--{ running }--[  ]--
A:# tools platform chassis reboot warm validate
/platform:
    Warm reboot validate requested
 /:
    Success

If the validation is successful, proceed with the warm reboot.

If the validation is unsuccessful, or if an attempt to perform a warm reboot fails, you can force the warm reboot using the additional force option.
Caution: Forcing a warm reboot may result in a service outage. The force option overrides any warnings, such as peers that are not configured, or peers that do not support graceful restart.
--{ running }--[  ]--
A:# tools platform chassis reboot warm force
/platform:
    Warm reboot force requested
 
/:
    Success

See the Configuration state support section in the Software Installation guide for information about how to use warm reboot during an ISSU.

Non-stop forwarding

Non-stop forwarding, or NSF, is the sequence of processes required to effectively switch control of a running system between two supervisors or CPMs (active/standby), without disrupting the data forwarding. It allows the router to continue forwarding data with previously known route/state information, while the control plane restarts and re-converges.

Similar to system warm reboot, NSF depends on graceful restart helpers, but it is primarily used for unplanned outage (for example, control plane failover) and cannot be used for upgrades.

During an NSF switchover, no control plane or management plane functions are available, including refreshing of neighbors, and slow path functions like DHCP relay and responding to ARP/ND.

In SR Linux, NSF leverages application warm restart, with the fundamental design to synchronize the IDB server (idb_server) with the standby CPM (supervisor), and to allow applications to leave their state information in IDB during the restart, then recover it after the restart. For more information, see Triggering redundancy switchover and Forcing redundancy synchronization.

Note:

Currently, the NSF feature is supported only for 7250 IXR-6/10/6e/10e platforms.

NSF features and limitations

NSF is supported with the following feature set:
  • Supports ACL, IPv4, and IPv6
  • Supports complete QoS feature set including queue configuration, classifiers, ECN with counters cleared.
  • Supports IPv4/IPv6 routing.
  • Supports BGP with IPv4/IPv6-unicast address families, where all neighboring devices support graceful restart helper.
    Note: Peers who do not support graceful restart withdraw routes during outages, impacting the data path for traffic destined for the system undergoing NSF. However, NSF must be attempted for any peers that support graceful restart helper.
  • Supports IS-IS with IPv4/IPv6-unicast address families, where all neighboring devices support graceful restart helper.
    Note: Peers who do not support graceful restart withdraw routes during outages, impacting the data path for traffic destined for the system undergoing NSF. However, NSF must be attempted for any peers that support graceful restart helper.
  • Supports LAG and LACP with both slow and fast timers.
  • Supports P4RT
  • Supports gRIBI. During NSF, though management plane functions are unavailable, gRIBI programmed routes are persisted.
  • Supports sFlow on slow path.
  • Supports LLDP.
    Note: In case of session time out, adjacency loss may occur.
Currently, NSF is not supported for the following:
  • MPLS ACLs
  • BFD
  • Any form of MPLS - LDP/SR
  • OSPFv2/v3

Triggering redundancy switchover

Use the command tools platform redundancy switchover to trigger a redundancy switchover from active to the standby control module. The switchover happens in conjunction with a cold restart.

To trigger redundancy switchover:

--{ running }--[  ]--
# tools platform redundancy switchover
The NSF implementation changes the default behavior of this command execution. Upon NSF implementation, when a redundancy switch over is triggered, the system always attempts to perform a NSF failover in conjunction with a warm restart.
Note: The command tools platform redundancy switchover is not impacted by any pending delayed reboots.

Forcing redundancy synchronization

You can use the tools platform redundancy synchronize overlay system to synchronize the overlay file system or system-required data between the active and standby control modules. The NSF implementation extends this behavior by including the synchronization of file system and IDB server info between the modules.

To synchronize, use the following command:
--{ running }--[  ]--
# tools platform redundancy synchronize overlay system