Alarms not appearing for rapidly reoccurring faults

Issue

Alarms are not generated under certain circumstances when the underlying fault is occurring and resolving faster than the duration of the NSP alarm flush interval. This only applies to alarms created by NSP alarm rules, and not to alarms received from other sources, for example using NETCONF. By default, the flush interval is four seconds, and the NSP may not capture events that occur and resolve faster than that, such as an ISIS interface that is flapping once every second.

You can configure the duration of the flush interval to capture rapidly flapping events. However, this impairs the performance of the NSP, as it increases how often the NSP writes to the alarm database. The following procedure describes how to configure the NSP alarm flush interval.

Configuring the NSP alarm flush interval

Note: Performing this procedure involves restarting a component of the NSP, and should not be performed outside of a maintenance window. In the event that resync-fw is upgraded, this procedure must be repeated to change the alarm flush interval again.

 

Log in as the root user on the NSP cluster host. For information about logging in to the NSP, see the NSP System Administrator Guide.


Open a console window.


In the nspos-resync-fw Kubernetes pod, open the following file in a text editor:

/opt/nsp/os/resync-fw/config/resync-fw-overrides.conf


The file contains parameter settings in nested declarations. The existing parameters depend on your current NSP configuration. Find or add the flush-buffer-interval-in-second parameter, which is nested under resync-fw > mdm > notification, as shown below:

resync-fw

{

   mdm

  {

     notification

     {

        flush-buffer-interval-in-second = 1

     }

  }

}

Configure the value of the parameter to the required duration of the NSP alarm flush interval, in seconds.


Save the configuration file, and enter the following to restart the nspos-resync-fw pod:

kubectl delete pod nspos-resync-fw↵