What are the NSP DR mechanisms?
Description
The NSP Disaster Recovery (DR) function involves redundant NSP clusters in a warm standby configuration for fault tolerance in the event of a cluster failure. The following procedures describe how to control and manage NSP DR.
DR functions
The following NSP DR functions swap the primary and standby NSP cluster roles:
-
failover—automatic DR role change initiated by the standby NSP cluster when a primary cluster failure is suspected
-
switchover—manual DR operation that switches the NSP cluster roles
Failovers and switchovers
NSP DR failovers and switchovers are controlled by the ASM and role manager services, which run as the nspos-asm-app and nsp-role-manager pods in each DR NSP cluster. The standby role manager periodically checks the connectivity to the role manager in the primary NSP cluster.
In addition, the role manager monitors essential primary pods and services such as the following:
If the role manager connectivity check fails for two minutes, or if an essential primary pod or service is down, the ASM triggers a failover to the standby cluster. The standby role manager and NSP cluster then assume the primary role. When the fault is resolved, the NSP automatically returns to normal operation with functional primary and standby clusters.
How do I identify the NSP cluster DR roles? describes how to display which role—primary or standby—is assigned to each NSP cluster. To restore the initial cluster roles after a failover, you perform a manual switchover, as described in How do I perform an NSP DR switchover?.
Note: After a failover or switchover, NSP functions restart processes that were interrupted. If downstream functions are not up yet, the restarted processes may fail. For example, if a network configuration deployment was auditing at the time of a failover, the audit will restart when Infrastructure Configuration Management is up. If Network Intents is not back up yet when the audit is restarted, the audit will fail. The process can be restarted manually when the NSP has stabilized.
Disabling and enabling failovers
NSP DR failovers are enabled by default in a DR NSP deployment. If required, you can disable failovers to prevent disruption during a period of maintenance activity, as described in How do I disable NSP DR failovers?
How do I display the NSP DR failover setting? describes how to identify whether failovers are enabled.
Note: The failover setting persists through an NSP software upgrade.
Note: For maximum fault tolerance, failovers must be disabled only during a maintenance period, and re-enabled after the maintenance period, as described in How do I enable NSP DR failovers?.