Disaster recovery
Disaster recovery
A DR NSP deployment consists of identical primary and standby NSP clusters and ancillary components in separate, geographically distributed data centers, or “sites”. One cluster has what is called the primary role, and processes all client requests.
The standby NSP cluster in a DR deployment operates in warm standby mode. If a primary cluster failure is detected, the standby automatically initializes as the primary, and fully assumes the primary role.
Note: Nokia strongly recommends that all primary components in a DR deployment be in the same physical facility. An NSP administrator can align the NSP component roles, as required.
NSP Role Manager
In a DR NSP deployment, the Role Manager runs in an NSP cluster and acts as a Kubernetes controller. The Role Manager monitors the Kubernetes objects for changes, and updates the objects as required based on the current primary or standby site role.
The Role Manager has the following operation modes:
-
standalone: The Role Manager sets the cluster mode to 'active' at initialization time, and does nothing more.
-
DR: The Role Manager negotiates the local role with the DR peer, determining which cluster will run in 'active' and which in 'standby' mode.
The Role Manager uses the configuration in the dr section of the NSP configuration file to identify the local and peer sites.
The NSP monitors the following NSP base services in a DR deployment:
DR fault conditions
If any base service in a DR deployment is unavailable for more than three minutes, or two instances of a service in an HA+DR deployment are unavailable for more than three minutes:
-
An activity switch occurs; consequently, the peer NSP cluster assumes the primary role.
-
An alarm is raised against the service or containing pod to indicate that the service or pod is down.
Note: Such an alarm may not be generated because of a base service disruption, depending on the circumstances.
-
A major ActivitySwitch alarm is raised against the former active site, which is now the standby site.
The following are the alarms that the NSP raises against the NmsSystem object in response to such a failure:
Note: If you clear an alarm while the failure condition is still present, the NSP does not raise the alarm again.
The following example describes an alarm condition in a simple DR deployment.
-
An activity switch occurs; the standby site consequently assumes the primary role.
-
A major ActivitySwitch alarm is raised against the former primary site, which is now the standby site.
DR for integrated components
A DR NSP deployment can include NFM-P and WS-NOC. NFM-P can be standalone or redundant; however, WS-NOC must be redundant. For example, if a DR deployment includes classic management, the NFM-P can be standalone and WS-NOC is redundant.
The following figure shows a simple NSP DR deployment.