Classic management fault tolerance
Introduction
The NFM-P uses component redundancy to ensure that there is no single point of NFM-P system failure.
Redundant physical network interfaces and points of network entry ensure that there is no single point of failure between the NFM-P system and the managed network. Redundant network paths, for example, in-band and out-of-band management, can help to prevent the isolation of a main server from the network in the event of a routing failure.
When a communication failure is detected between the primary NSP and the NFM-P main/DB server, JMS server, auxiliary server, or auxiliary database, the NspBaseServiceDown alarm is raised against the affected component to provide real-time notification of the fault.
Main server and database redundancy
A redundant NFM-P system consists of a primary main server and primary main database that actively manage the network, and a second main server and database in warm standby mode. The following figure shows a distributed NFM-P system in a redundant deployment.
Figure 8-3: Redundant NFM-P system
Main server redundancy
Main server redundancy is achieved using clustering technology provided by a JBOSS server on each main server. The primary and standby main servers regularly poll each other to monitor availability. Traps from the managed network are always sent to both main servers in order to avoid delays in the event that a main server fails.
If the primary server loses visibility of the standby server, it notifies the GUI clients. If the standby server loses visibility of the primary server, the standby server attempts to become the primary server by connecting to the primary database.
Main database redundancy
NFM-P database redundancy uses Oracle Data Guard Replication in real-time apply mode to keep the standby database synchronized with changes in the primary database. The supported fault-recovery operations are database switchovers and database failovers. A switchover is a manual operation that switches the primary and standby database roles. A failover is an automatic operation that forces the standby database to become the primary database when a primary main database failure is detected.
The primary main server regularly polls each main database. If the primary or standby database is unavailable, the main server notifies the GUI clients. If both main servers lose contact with the primary main database, a failover occurs and the standby main database assumes the primary role.
Auxiliary servers and NFM-P redundancy
Auxiliary servers are passively redundant. They do not cause or initiate main server or database redundancy activities, but if a Preferred auxiliary server ceases to respond to requests from the primary main server and a Reserved auxiliary server is available, the main server directs the current and subsequent requests to the Reserved auxiliary server until the Preferred auxiliary server is available.
An auxiliary server communicates only with the current primary server and database. After an NFM-P redundancy activity such as a database failover, the primary main server directs the auxiliary servers to communicate with the current primary component instead of the former primary component.