Connection monitoring and error recovery

Overview

A JMS client must monitor and, if required, recover from the following:

The scenarios can occur separately or in conjunction. If a client has a durable subscription, a loss of connectivity may occur without causing events to be lost. Event loss can occur under heavy load without a connectivity loss; the client is notified using a jmsMissedEvents message. A subscription removal can accompany a connectivity loss, for example, after a main server activity switch or restart.

The manner in which a client manages each scenario depends on the specific needs and the expectations of their customers. For example, for some applications, a loss of connectivity should always result in the OSS application reconnecting to the NFM-P. In other situations, the desired behavior may depend on the cause of the connectivity loss—for example, if an NFM-P administrator intentionally disconnects an OSS, it may not be desirable to immediately reconnect.

How to recover from lost events also depends on the OSS. If an OSS uses JMS events to maintain a local information store that mirrors some data set in the NFM-P—such as a network inventory or a list of current alarms—then a loss of events results in the OSS being out of sync with the NFM-P.

The following scenarios could be used to determine whether the OSS needs to resync the database of inventory and alarm information:

When an OSS detects a lack of synchronization, the recovery scenarios include:

When you implement a recovery procedure, you must consider that events will continue to occur in the network while an OSS is busy resynchronizing or populating its database for the first time. For this reason, an OSS must process events while they are resynchronizing, and may need to recover from error conditions that require them to reconnect or restart the resync.

JMS exceptions

You must use JMS exceptions if you have a JMS connection to an NFM-P server. JMS internally monitors client connections to the JMS server and throws an exception if a connection between the JMS server and a client is lost. You must implement the javax.jms.ExceptionListener interface on the JMS client to enable the monitoring of exceptions. The interface contains a call-back method that is invoked for all exceptions. A typical implementation of the call-back method attempts to reconnect and possibly take another action, such as generating an event.

Monitoring for incoming events

In addition to handling exceptions, an OSS must monitor incoming events to ensure that the JMS connection is active. A KeepAliveEvent is published at approximately 30-second intervals to each of the JMS topics even when no other messages are sent.

KeepAliveEvents may be received ahead of other event types.

If no events are received within a reasonable time period, you must investigate the status of the NFM-P server.

XML API session termination

You can use the NFM-P GUI client to close and remove a durable subscription when you no longer require a durable client. See the procedure to disconnect an XML API JMS client connection or remove a durable subscription in the NSP System Administrator Guide for more information about removing durable subscriptions.

The XML API uses the following JMS event to notify OSS client applications of a session termination:

TerminateClientSession

The TerminateClientSession event indicates that a client JMS session is about to be closed. The client must clean up the disconnected session when the message is received. Additional session termination behavior is dependent on the requirements of your OSS client application. For example, you can also configure the requirement to close the OSS client application.

Missed events

Both durable and non-durable subscriptions can be used by OSS clients that cannot tolerate losing events without taking corrective action. However, in certain situations, events can be missed and subscribers are notified with a JmsMissedEvents message.

JMS messages are queued by the NFM-P until they are acknowledged by the JMS client. If the JMS message queue overflows, the following occurs for both durable and non-durable JMS subscriptions:

The following are example scenarios in which messages may be missed:

The following table lists and describes the alarms that are raised against JMS clients.

Table 4-8: JMS client alarms

Alarm

Description

JMSDurableClientReset

Raised when a durable JMS client is reset as the result of a JMS server restart or activity switch

JMSClientMessagesRemoved

Raised when a JMS client has messages removed after exceeding the configured message limit. This applies to OSS durable subscribers only.

JMSDurableClientUnsubscribed

Raised when a durable JMS client is automatically unsubscribed. This occurs when a disconnected durable client exceeds the configured message limit.

JmsMissedEvents message

The NFM-P sends a JmsMissedEvents message to indicate that events have been lost, allowing subscribers to detect missed events. The JmsMissedEvents message is a StateChange Event, as shown in the following figure.

Figure 4-18: JmsMissedEvents message example
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
   <SOAP:Header>
      <header xmlns="xmlapi_1.0">
         <eventName>JmsMissedEvents</eventName>
         <MTOSI_osTime>1222891102168</MTOSI_osTime>
         <ALA_clientId>JMS_client@n</ALA_clientId>
         <MTOSI_NTType>ALA_OTHER</MTOSI_NTType>
         <MTOSI_objectType>StateChangeEvent</MTOSI_objectType>
         <ALA_category>GENERAL</ALA_category>
         <ALA_isVessel>false</ALA_isVessel>
         <ALA_allomorphic/>
         <ALA_eventName>JmsMissedEvents</ALA_eventName>
         <MTOSI_objectName/>
         <ALA_OLC>0</ALA_OLC>
      </header>
   </SOAP:Header>
   <SOAP:Body>
      <jms xmlns="xmlapi_1.0">
         <stateChangeEvent>
            <eventName>JmsMissedEvents</eventName>
            <state>jmsMissedEvents</state>
         </stateChangeEvent>
      </jms>
   </SOAP:Body>
</SOAP:Envelope>
Recovery from missed events

Clients must be able to recognize when events are missed and take the appropriate recovery action. For example, in an OSS application for which events are being used to maintain inventory information, the recovery action could include:

Although an OSS application must have measures in place to recover from missed events, the OSS application can implement prevention methods. The following are examples of ways an OSS application can prevent missed events:

© 2023 Nokia. Nokia Confidential Information

Use subject to agreed restrictions on disclosure and use.