Pathway for troubleshooting Cloud Native telemetry alarms

Purpose

This pathway provides a flow of tasks you can perform to investigate a CN telemetry alarm. In this scenario, the subscription is created but expected output of the telemetry data is not found. Telemetry alarms are supported for the following.

  • gNMI collector

  • accounting processor (AP)

  • request processor (RP)

    RP alarms are raised per subscription, not per NE.

Alarm characteristics

Alarms are generated automatically based on specific failure conditions.

Most alarms do not clear implicitly once the issue is resolved. After fixing the root cause, you must manually clear the alarm from the Current Alarms view.

Alarm troubleshooting flow
Figure 5-4: Telemetry alarm troubleshooting flow
Stages
Determine the impacted component
 

Open Current Alarms.

Alarms are listed with details such as alarm name, severity, alarmed object type, probable cause, subscription (for RP alarms), and timestamp.


The alarm name provides the impacted component, as shown in the following table.

Alarm name

Impacted component

Proceed to

TelemetryMappingError

Request processor

Stage 3

TelemetrySubscriptionError

Stage 4

TelemetryCollectionDeadlineMissed

gNMI collector

Stage 5

FTPClientFailure

Accounting Processor

Dependent platform services (Kafka, RP, PG, Vertica, File pods)

Stage 6

AccountingProcessorError

Stage 7

ApplicationDependencyFailure

Stage 8


Request processor alarms
 

From the System Health dashboard, open Log Viewer and click Discover. Search tlm-request-processor to open the logs.

  1. If the logs include “Unable to find transformer” or “Unable to find device helper”, check the NE Type mentioned in the error, and check that required CRs are installed:

    1. Open Artifacts, Artifact Bundles.

    2. Select the NE adaptation bundle and choose png3.png (Table row actions) View Artifacts.

    3. In the Artifact List, verify that the status of the transformer and device helper CRs is Installed.

    4. If the status of the required CR artifacts is not Installed, enable automated retry or reinstall the adaptation bundle; see How do I retry a failed artifact operation? and How do I install an artifact bundle? in the NSP Network Automation Guide.

  2. If the logs include "Unable to ConstructDevice path", the object filter is incorrect. Update the subscription with a valid object filter.


A telemetry subscription error occurs when File Output is selected in a gNMI subscription. File output is supported for accounting only.

  1. Remove File Output from the subscription.

  2. From the System Health dashboard, open Log Viewer and click Discover. Search tlm-accounting-processor to open the log.

    Check that the subscription is operating correctly.

  3. Manually clear the alarm from the Current Alarms view.


gNMI collector alarms
 

A TelemetryCollectionDeadlineMissed alarm occurs when collection is attempted on a missing or deleted object.

  1. Check the alarm information in the Current Alarms view to find the missing object.

  2. Re-enable or restore the missing object on the NE.

The alarm clears automatically when the missing object is restored and collection resumes.


Accounting processor alarms
 

If FTP or SFTP is failing:

  1. Open Device Management, Managed Network Elements and select the NE.

    Verify that the NE is reachable.

  2. Choose png218.png (Mediation Policies) in the Summary panel. Click a file transfer policy name to view the details.

    Verify that the credentials in the file transfer policy match the credentials configured on the NE.

  3. From the System Health dashboard, open Log Viewer and click Discover. Search tlm-accounting-processor to open the logs.

    Verify that FTP/SFTP sessions are working.


If accounting processing is failing, the accounting files are not configured correctly.

On the NE, verify and correct the accounting file format, syntax, quotes, and supported keywords.


If an accounting dependency is failing, a dependent pod is experiencing a problem.

Determine the dependent pod (Kafka, postgres, Vertica, file pods) and scale up as needed.