How do I respond to redundancy failures?
Overview
The following describe the NFM-P actions in response to various types of redundancy failures.
-
Primary server loses contact with primary database
If the standby server can communicate with the primary database and the managed NEs, the primary server performs a server activity switch. No database failover occurs.
If automatic database realignment is enabled, the new primary server performs a database switchover.
-
Primary server loses contact with managed NEs
If the standby server can communicate with the primary database and the managed NEs, the primary server performs a server activity switch.
If automatic database realignment is enabled, the new primary server performs a database switchover.
-
Primary server loses contact with primary database and managed NEs
If the standby server can communicate with the primary database and the managed NEs, the primary server performs a server activity switch. No database failover occurs.
If automatic database realignment is enabled, the new primary server performs a database switchover.
-
Primary server loses contact with primary database, managed NEs, and standby server
The standby server activates to become the new primary server, and if automatic database realignment is enabled, initiates a database switchover.
-
Both servers lose contact with primary database
The primary server initiates a database failover, and if automatic database realignment is enabled, also initiates a server activity switch.
-
Both servers lose contact, primary server and database can communicate
The primary server and database remain the primary server and database. The NFM-P raises an alarm about the server communication failure.
-
Both servers lose contact with managed NEs
If the primary and standby servers can each communicate with the preferred database, no server activity switch or database failover occurs. The NFM-P raises a reachability alarm against each NE in the network.
-
Both servers lose contact with primary database and managed NEs
If the primary and standby servers can communicate with each other, no server activity switch or database failover occurs. However, the NFM-P system is unavailable; manual intervention such as a database failover is required.
-
Both servers fail, primary database isolated, standby database operational
When both servers return to operation, the servers cannot connect to the primary database. Because the state of the standby database is unknown, no database failover occurs; manual intervention such as a database switchover is required.
Collocated system, primary station unreachable
Figure 17-11: Primary server and database station down, collocated system
The following occur when the primary station becomes unresponsive:
-
The standby server and database become the primary server and database.
-
Redundancy is restored when the former primary station returns to service as the standby station.
Distributed system, primary server unreachable
Figure 17-12: Primary server unreachable, distributed system
The following occur when the primary station becomes unresponsive:
-
The standby server detects the connectivity loss and becomes the primary server.
-
The new primary main server raises alarms about the unavailability of the former standby, and about the activity switch.
-
If automatic database realignment is enabled, the new primary server initiates a database switchover.
-
When connectivity is restored, the former primary server assumes the standby server role.
Distributed system, standby server unreachable
Figure 17-13: Standby server unreachable, distributed system
The following occur when the standby station becomes unresponsive:
-
The standby server interprets the primary server unresponsiveness as a primary server failure, so attempts to assume the primary server role.
-
The primary server generates an alarm to indicate that the standby server is down.
-
When the reachability is restored, the standby server resumes the standby role and the alarm clears.
Distributed system, managed network unreachable by primary side
Figure 17-14: Network failure on primary side, distributed system
The following occur after the connectivity loss is detected:
-
The initial primary server continues to operate as a primary server.
-
The initial primary server generates an alarm about the standby server unavailability, and a reachability alarm against each NE in the network.
Note: You can eliminate a single point of hardware or network failure by using redundant interfaces and redundant physical network paths. See the NSP NFM-P Planning Guide for more information.
Split complex
A split complex is a scenario in which both servers in a collocated or distributed system lose contact, but each server can communicate with the preferred database, as shown in the following figure.
Figure 17-15: Split complex, collocated or distributed system
The following occur after the connectivity loss is detected:
-
The initial primary server and database roles do not change; the initial primary server continues to manage the network. The client sessions are not interrupted.
-
The primary server raises an alarm about the communication failure.
-
The standby server and database switch roles to become a second primary server and database.
-
New clients connect to the initial primary server; however, if a client explicitly tries to connect to the second primary server, a session is established.
- When the servers regain contact: