Appendix A: Supported alarms

Equipment alarms

Among equipment alarms, there are several groups of transceiver-related environment alarms with escalating thresholds:

  • low warning: a Warning alarm indicating that a minor low threshold has been crossed.
  • low alarm: a Critical alarm indicating that a major low threshold has been crossed.
  • high warning: a Warning alarm indicating that a minor high threshold has been crossed.
  • high alarm: a Critical alarm indicating that a major high threshold has been crossed.
Table 1. 1001 Fan tray fault
Alarm ID: 1001
Alarm name: Fan Tray Fault
Description: This alarm is raised when the associated fan-tray is in down/empty/failed/degraded/low-power operational state. The system may have cooling issues.
Severity: Major
Probable cause: Equipment malfunction
Remedial action: The failed fan unit should be replaced.
Table 2. 1003 Power supply fault
Alarm ID: 1003
Alarm name: Power Supply Fault
Description: The alarm is raised when the associated power supply is not operationally Up. The specified power supply can no longer supply power to the system.
Severity: Critical
Probable cause: Power problem
Remedial action: Check the status of the power supply.
Table 3. 1004 Chassis fault
Alarm ID: 1004
Alarm name: Chassis Fault
Description: The alarm is raised when chassis is operationally down.
Severity: Critical
Probable cause: Equipment malfunction
Remedial action: Chassis Down
Table 4. 1005 CPM fault
Alarm ID: 1005
Alarm name: CPM Fault
Description: This alarm is generated when the control module is in an operationally down/empty/failed/degraded/low-power state.
Severity: Critical
Probable cause: Equipment malfunction
Remedial action: Remove the card and reset it. If this does not clear the alarm then please contact your Nokia support representative for assistance.
Table 5. 1006 SFM fault
Alarm ID: 1006
Alarm name: SFM Fault
Description: The alarm is raised when the associated SFM module is in operationally down/empty/failed/degraded/low-power state. Traffic could be impacted.
Severity: Critical
Probable cause: Equipment malfunction
Remedial action: The active CPM is at risk of failing to initialize after node reboot because it cannot access the SFM. Contact Nokia customer support.
Table 6. 1007 Line card fault
Alarm ID: 1007
Alarm name: Line Card Fault
Description: The alarm is raised when the specified line card is in an operationally down/empty/failed/degraded/low-power state. Traffic is no longer being transmitted from this line card.
Severity: Major
Probable cause: Equipment malfunction
Remedial action: Ensure that the line card is operationally up. Line card may need to be replaced.
Table 7. 1008 Interface transceiver down
Alarm ID: 1008
Alarm name: Interface Transceiver Down
Description: The alarm is raised when a transceiver goes into the operational Down state as a result of one of the following possible failures:
  • read
  • checksum
  • unknown
  • tx-laser
  • connector
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Ensure that the transceiver is in good operating condition and is compatible with the associated interface. The transceiver may need to be replaced.
Table 8. 1009 Transceiver channel high input power warning
Alarm ID: 1009
Alarm name: Transceiver Channel High Input Power Warning
Description: The alarm is raised when a transceiver's input power exceeds the configured Warning threshold for high input power.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 9. 1010 Transceiver channel high input power alarm
Alarm ID: 1010
Alarm name: Transceiver Channel High Input Power Alarm
Description: The alarm is raised when a transceiver's input power exceeds the configured Alarm threshold for high input power.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 10. 1011 Transceiver channel low input power warning
Alarm ID: 1011
Alarm name: Transceiver Channel Low Input Power Warning
Description: The alarm is raised when a transceiver's input power is below the configured Warning threshold for low input power.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 11. 1012 Transceiver channel low input power alarm
Alarm ID: 1012
Alarm name: Transceiver Channel Low Input Power Alarm
Description: The alarm is raised when a transceiver's input power is below the configured Alarm threshold for low input power.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 12. 1013 Transceiver channel high laser bias current alarm
Alarm ID: 1013
Alarm name: Transceiver Channel High Laser Bias Current Alarm
Description: The alarm is raised when a transceiver's high laser bias current exceeds the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 13. 1014 Transceiver channel high laser bias current warning
Alarm ID: 1014
Alarm name: Transceiver Channel High Laser Bias Current Warning
Description: The alarm is raised when a transceiver's high laser bias current exceeds the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 14. 1015 Transceiver channel high output power warning
Alarm ID: 1015
Alarm name: Transceiver Channel High Output Power Warning
Description: The alarm is raised when a transceiver's output power exceeds the configured Warning threshold for high output power.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 15. 1016 Transceiver channel high output power alarm
Alarm ID: 1016
Alarm name: Transceiver Channel High Output Power Alarm
Description: The alarm is raised when a transceiver's output power exceeds the configured Alarm threshold for high output power.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 16. 1017 Transceiver channel low output power warning
Alarm ID: 1017
Alarm name: Transceiver Channel Low Output Power Warning
Description: The alarm is raised when a transceiver's output is below the configured Warning threshold for low output power.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 17. 1018 Transceiver channel low output power alarm
Alarm ID: 1018
Alarm name: Transceiver Channel Low Output Power Alarm
Description: The alarm is raised when a transceiver's output power is below configured Alarm threshold for low output power.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 18. 1019 Transceiver channel low laser bias current alarm
Alarm ID: 1019
Alarm name: Transceiver Channel Low Laser Bias Current Alarm
Description: The alarm is raised when a transceiver's low laser bias current is below the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 19. 1020 Transceiver channel low laser bias current warning
Alarm ID: 1020
Alarm name: Transceiver Channel Low Laser Bias Current Warning
Description: The alarm is raised when a transceiver's low laser bias current is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 20. 1021 Transceiver low laser bias current warning
Alarm ID: 1021
Alarm name: Transceiver Low Laser Bias Current Warning
Description: The alarm is raised when a transceiver's laser bias current is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 21. 1022 Transceiver low laser bias current alarm
Alarm ID: 1022
Alarm name: Transceiver Low Laser Bias Current Alarm
Description: The alarm is raised when a transceiver's laser bias current is below the configured Alarm threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 22. 1023 Transceiver high laser bias current warning
Alarm ID: 1023
Alarm name: Transceiver High Laser Bias Current Warning
Description: The alarm is raised when a transceiver's laser bias current is above the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 23. 1024 Transceiver high laser bias current alarm
Alarm ID: 1024
Alarm name: Transceiver High Laser Bias Current Alarm
Description: The alarm is raised when a transceiver's laser bias current is above the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 24. 1025 Transceiver high input power alarm
Alarm ID: 1025
Alarm name: Transceiver High Input Power Alarm
Description: The alarm is raised when a transceiver's input power is above the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 25. 1026 Transceiver high input power warning
Alarm ID: 1026
Alarm name: Transceiver High Input Power Warning
Description: The alarm is raised when a transceiver's input power is above the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 26. 1027 Transceiver low input power warning
Alarm ID: 1027
Alarm name: Transceiver Low Input Power Warning
Description: The alarm is raised when a transceiver's input power is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 27. 1028 Transceiver low input power alarm
Alarm ID: 1028
Alarm name: Transceiver Low Input Power Alarm
Description: The alarm is raised when a transceiver's input power is below the configured Critical threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 28. 1029 Transceiver high output power alarm
Alarm ID: 1029
Alarm name: Transceiver High Output Power Alarm
Description: The alarm is raised when a transceiver's output power above the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 29. 1030 Transceiver high output power warning
Alarm ID: 1030
Alarm name: Transceiver High Output Power Warning
Description: The alarm is raised when a transceiver's output power is above the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 30. 1031 Transceiver low output power warning
Alarm ID: 1031
Alarm name: Transceiver Low Output Power Warning
Description: The alarm is raised when a transceiver's output power is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 31. 1032 Transceiver low output power alarm
Alarm ID: 1032
Alarm name: Transceiver Low Output Power Alarm
Description: The alarm is raised when a transceiver's output power is below the configured Critical threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 32. 1033 Transceiver high voltage alarm
Alarm ID: 1033
Alarm name: Transceiver High Voltage Alarm
Description: The alarm is raised when a transceiver's voltage is above the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 33. 1034 Transceiver high voltage warning
Alarm ID: 1034
Alarm name: Transceiver High Voltage Warning
Description: The alarm is raised when a transceiver's voltage is above the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 34. 1035 Transceiver low voltage warning
Alarm ID: 1035
Alarm name: Transceiver Low Voltage Warning
Description: The alarm is raised when a transceiver's voltage is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 35. 1036 Transceiver low voltage alarm
Alarm ID: 1036
Alarm name: Transceiver Low Voltage Alarm
Description: The alarm is raised when a transceiver's voltage is below the configured Critical threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link.
Table 36. 1037 Transceiver high temperature alarm
Alarm ID: 1037
Alarm name: Transceiver High Temperature Alarm
Description: The alarm is raised when a transceiver's temperature rises above the configured Alarm threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that the correct cables and transceivers are used on both ends of the link. Additionally, verify that fans are operating correctly.
Table 37. 1038 Transceiver high temperature warning
Alarm ID: 1038
Alarm name: Transceiver High Temperature Warning
Description: The alarm is raised when a transceiver's temperature is above the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that the correct cables and transceivers are used on both ends of the link. Additionally, verify that fans are operating correctly.
Table 38. 1039 Transceiver low temperature warning
Alarm ID: 1039
Alarm name: Transceiver Low Temperature Warning
Description: The alarm is raised when a transceiver's temperature is below the configured Warning threshold.
Severity: Warning
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that correct cables and transceivers are used on both ends of the link. Additionally, verify fans are operating correctly.
Table 39. 1040 Transceiver low temperature alarm
Alarm ID: 1040
Alarm name: Transceiver Low Temperature Alarm
Description: The alarm is raised when a transceiver's temperature is below the configured Critical threshold.
Severity: Critical
Probable cause: DTE DCE TRANSCEIVER ERROR
Remedial action: Verify that the correct cables and transceivers are used on both ends of the link. Additionally, verify that fans are operating correctly.

Communication alarms

Table 40. 4001 LLDP adjacency down
Alarm ID: 4001
Alarm name: LLDP Adjacency Down
Description: The alarm is raised when the Operational State of an LLDP adjacency is down. This is because the operational state of local interface is in a down state.
Severity: Major
Probable cause: DTE DCE Interface Error
Remedial action: The operational state of the interface must be up in order for the selected adjacency to be up.
Table 41. 4002 Interface down
Alarm ID: 4002
Alarm name: Interface Down
Description: The alarm is raised when the operational state of interface is down.
Severity: Critical
Probable cause: DTE DCE Interface Error
Remedial action: The condition exists because the physical interface is down either because it is administratively disabled, faulty or a cabling fault has occurred. Ensure that the interface is administratively up.

Check for a poor cable connection to the port or for a faulty cable/fiber. If neither appears to be the problem run diagnostics on the port to determine if it is faulty.

Table 42. 4003 Subinterface down
Alarm ID: 4003
Alarm name: Subinterface Down
Description: The alarm is raised when the operational state of subinterface is down.
Severity: Critical
Probable cause: DTE DCE Interface Error
Remedial action: The condition exists because the subinterface is down either because it is administratively disabled, faulty or a cabling fault has occurred. Ensure that the subinterface is administratively up.

Check for a poor cable connection to the port or for a faulty cable/fiber. If neither appears to be the problem run diagnostics on the port to determine if it is faulty.

Table 43. 4004 BGP adjacency down
Alarm ID: 4004
Alarm name: BGP Adjacency Down
Description: The alarm is raised when the BGP neighbor state transitions out of the Established state.
Severity: Critical
Probable cause: DTE DCE BGP Error
Remedial action: Verify reachability and BGP parameters match between BGP neighbors.
Table 44. 4005 BFD session down
Alarm ID: 4005
Alarm name: BFD Session Down
Description: The alarm is raised when the BFD session is operationally down.
Severity: Critical
Probable cause: DTE DCE BFD Error
Remedial action: Verify reachability between BFD neighbors.
Table 45. 4006 Network instance down
Alarm ID: 4006
Alarm name: Network Instance Down
Description: The alarm is raised when a network-instance is down.
Severity: Critical
Probable cause: DTE DCE NET INST DOWN
Remedial action: Verify the configuration of the network-instance.
Table 46. 4007 Interface LAG member down
Alarm ID: 4007
Alarm name: Interface LAG Member Down
Description: The alarm is raised when a member of a LAG goes into the operational down state.
Severity: Warning
Probable cause: DTE DCE INT LAG DOWN
Remedial action: The condition exists because the physical interface belonging to a LAG is down. The interface could be down because it is administratively disabled, faulty or a cabling fault has occurred.

Do the following:

  1. Ensure that the interface is administratively up.
  2. Check for a poor cable connection to the port or for a faulty cable/fiber.
  3. If neither appears to be the problem run diagnostics on the port to determine if it is faulty.
Table 47. 4040 Network instance interface down
Alarm ID: 4040
Alarm name: Network Instance Interface Down
Description: The alarm is raised when an interface configured within a network-instance is down.
Severity: Critical
Probable cause: DTE DCE NET INST INT DOWN
Remedial action: Verify the operational state of the network instance interface.
Table 48. 4041 Network instance VXLAN interface down
Alarm ID: 4041
Alarm name: Network Instance VXLAN Interface Down
Description: The alarm is raised when a VXLAN interface configured within a network instance is down.
Severity: Critical
Probable cause: DTE DCE NET INST VXLAN INT DOWN
Remedial action: Verify the operational state of the network instance VXLAN interface.
Table 49. 4042 BGP down
Alarm ID: 4042
Alarm name: BGP Down
Description: The alarm is raised when the BGP operational state transitions to the Down state.
Severity: Critical
Probable cause: DTE DCE BGP ERROR
Remedial action: Verify the configuration of BGP on the affected device.
Table 50. 4043 BGP EVPN instance down
Alarm ID: 4043
Alarm name: BGP EVPN Instance Down
Description: The alarm is raised when the BGP operational state transitions to the Down state.
Severity: Critical
Probable cause: DTE DCE BGP ERROR
Remedial action: Verify configuration of BGP on device.
Table 51. 4044 BGP IPv4 neighbor down
Alarm ID: 4044
Alarm name: BGP IPv4 Neighbor Down
Description: The alarm is raised whenever an IPv4 Unicast BGP family has not been negotiated correctly between two BGP neighbors.
Severity: Major
Probable cause: DTE DCE BGP ERROR
Remedial action: Ensure that both neighbors are exchanging the same BGP families.
Table 52. 4045 BGP IPv6 neighbor down
Alarm ID: 4045
Alarm name: BGP IPv6 Neighbor Down
Description: The alarm is raised whenever an IPv6 Unicast BGP family has not been negotiated correctly between two BGP neighbors.
Severity: Major
Probable cause: DTE DCE BGP ERROR
Remedial action: Ensure that both neighbors are exchanging the same BGP families.
Table 53. 4046 BGP EVPN neighbor down
Alarm ID: 4046
Alarm name: BGP EVPN Neighbor Down
Description: The alarm is raised whenever an EVPN BGP family has not been negotiated correctly between two BGP neighbors.
Severity: Major
Probable cause: DTE DCE BGP ERROR
Remedial action: Ensure that both neighbors are exchanging the same BGP families.

Operational alarms

Table 54. 5001 GNMI connection fault
Alarm ID: 5001
Alarm name: GNMI Connection Fault
Description: GNMI connection to the network element has been lost.
Severity: Major
Probable cause: DTE DCE Interface Error
Remedial action: Check network connectivity to restore GNMI connection.
Table 55. 5002 Memory usage warning
Alarm ID: 5002
Alarm name: Memory Usage Warning
Description: The alarm is raised when a device's memory utilization exceeds 75%.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Check system process memory utilization.
Note: This alarm indicates that memory utilization exceeds 75%. This alarm persists even if the Major alarm for utilization greater than 95% is triggered.
Table 56. 5003 Memory usage major
Alarm ID: 5003
Alarm name: Memory Usage Major
Description: The alarm is raised when a device's memory utilization exceeds 95%.
Severity: Major
Probable cause: SYSTEM WARNING
Remedial action: Check system process memory utilization.
Note: This alarm indicates that memory utilization exceeds 95%. Its triggering does not affect the preceding Warning indicating 75% usage.
Table 57. 5004 AAA server down
Alarm ID: 5004
Alarm name: AAA Server Down
Description: The alarm is raised when a configured AAA server goes into the operationally down state.
Severity: Major
Probable cause: DTE DCE AAA DOWN
Remedial action: Verify the device configuration and the ability to reach the AAA server from the device.

Fabric Services System alarms

Table 58. 6001 Connect Fabric Services System configuration failed
Alarm ID: 6001
Alarm name: Connect Fabric Services System Configuration Failed
Description: The alarm is raised when changes on Plugin API cannot be configured on the Fabric Services System.
Severity: Critical
Probable cause: Configuration or customization error
Remedial action: The condition exists because Connect cannot provision the Fabric Services System with the intended configuration on its Plugin API. Sanitize the Fabric Services System to resolve this error and perform audit on Connect.
Table 59. 6002 Connect Fabric Services System workload intent deploy failed
Alarm ID: 6002
Alarm name: Connect Fabric Services System Workload Intent Deploy Failed
Description: The alarm is raised when generating configurations and deploying a workload intent is not possible.
Severity: Critical
Probable cause: Configuration or customization error
Remedial action: The condition exists because Connect cannot deploy the workload intent on the Fabric Services System. Please make sure the workload intent is in a deployable state and perform an audit on Connect.
Table 60. 6003 Connect Fabric Services System authentication failed
Alarm ID: 6003
Alarm name: Connect Fabric Services System Authentication Failed
Description: The alarm is raised when Connect cannot authenticate with the Fabric Services System.
Severity: Critical
Probable cause: Authentication failure
Remedial action: The condition exists because Connect cannot authenticate with the Fabric Services System. Make sure the Connect configuration is correct and perform an audit on Connect.
Table 61. 6004 Connect plugin heartbeat lost
Alarm ID: 6005
Alarm name: Connect Plugin Heartbeat Lost
Description: The alarm is raised when Connect no longer detects heartbeat messages from one of its plugins.
Severity: Critical
Probable cause: CONNECT PLUGIN HEARTBEAT LOST
Remedial action: The condition exists because Connect cannot detect the presence of one of its plugins. Make sure the plugin is running and actively issuing heartbeat messages.
Table 62. 6005 Connect plugin CMS authentication failure
Alarm ID: 6006
Alarm name: Connect Plugin CMS Authentication Failure
Description: The alarm is raised when a plugin fails to authenticate with the CMS for a given deployment.
Severity: Critical
Probable cause: CONNECT PLUGIN CMS AUTHENTICATION FAILURE
Remedial action: The condition exists because a plugin cannot authenticate with the CMS. Make sure the deployment configuration is correct.
Table 63. 6007 Connect plugin Connect out of sync with CMS
Alarm ID: 6007
Alarm name: Connect Plugin Connect Out of Sync with CMS
Description: The alarm is raised when a plugin re-establishes communication with Connect, after having registered changes to its deployments during the period of connection loss.
Severity: Critical
Probable cause: CONNECT PLUGIN CONNECT OUT OF SYNC WITH CMS
Remedial action: The condition exists because the plugin lost connectivity with Connect for some time. Perform an audit if Connect is out of sync with the CMS.
Table 64. 6008 Connect plugin CMS connectivity failure
Alarm ID: 6008
Alarm name: Connect Plugin CMS Connectivity Failure
Description: The alarm is raised when a plugin has lost communication with the CMS, or if there was no connectivity at all for a given deployment.
Severity: Critical
Probable cause: CONNECT PLUGIN CMS CONNECTIVITY FAILURE
Remedial action: The condition exists because plugin cannot connect to the CMS. Please make sure you have connectivity between the plugin and the CMS.
Table 65. 6009 Connect resource out of sync
Alarm ID: 6009
Alarm name: Connect Resource Out Of Sync
Description: The alarm is raised when a resource in connect is out of sync.
Severity: Critical
Probable cause: INCORRECT CONFIGURATION
Remedial action: The condition exists because a resource in Connect is out of sync. Run an audit on the deployment this resource belongs to.
Table 66. 6010 Connect plugin CMS certificate verification failure
Alarm ID: 6010
Alarm name: Connect Plugin CMS Certificate Verification Failure
Description: The alarm is raised when plugin failed to authenticate with CMS because of the given deployment certificate.
Severity: Critical
Probable cause: CONNECT PLUGIN CMS CERTIFICATE VERIFICATION FAILURE
Remedial action: The condition exists because a plugin cannot authenticate with CMS because of the given deployment certificate. Make sure the deployment certificate is correct.
Table 67. 6011 Connect plugin CMS resource misconfigured
Alarm ID: 6011
Alarm name: Connect Plugin CMS Resource Misconfigured
Description: The alarm is raised when plugin failed to create resources because of a misconfigured CMS resource.
Severity: Critical
Probable cause: INCORRECT CONFIGURATION
Remedial action: Configure the CMS resource correctly. If the alarm persists after reconfiguration, audit the deployment.
Table 68. 9009 Instance down
Alarm ID: 9009
Alarm name: Instance Down
Description: The alarm is raised when the pod instance has been down for more than 5 minutes.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 69. 9010 Kubernetes pod crash looping
Alarm ID: 9010
Alarm name: Kubernetes Pod Crash Looping
Description: The alarm is raised when the pod instance is consistently crashing upon restart.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 70. 9011 Kubernetes pod not healthy
Alarm ID: 9011
Alarm name: Kubernetes Pod Not Healthy
Description: The alarm is raised when the pod instance has been in a non-ready state for longer than five minutes
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 71. 9012 Kubernetes container OOM killer
Alarm ID: 9012
Alarm name: Kubernetes Container OOM Killer
Description: The alarm is raised when the container in pod has been OOMKilled multiple times in the last 10 minutes.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 72. 9013 Kubernetes StatefulSet down
Alarm ID: 9013
Alarm name: Kubernetes Stateful Set Down
Description: The alarm is raised when the Kubernetes StatefulSet is down.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 73. 9014 Kubernetes StatefulSet replicas mismatch
Alarm ID: 9014
Alarm name: Kubernetes Statefulset Replicas Mismatch
Description: The alarm is raised when the StatefulSet does not match the expected number of replicas.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 74. 9015 Kubernetes deployment down
Alarm ID: 9015
Alarm name: Kubernetes Deployment Down
Description: The alarm is raised when the deployment is in a down state.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 75. 9016 Kubernetes deployment replicas mismatch
Alarm ID: 9016
Alarm name: Kubernetes Deployment Replicas Mismatch
Description: The alarm is raised when the deployment does not have the expected number of replicas.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 76. 9017 Kubernetes daemonset rollout stuck
Alarm ID: 9017
Alarm name: Kubernetes Daemonset Rollout Stuck
Description: The alarm is raised when the pod instance DaemonSet rollout is stuck. Some pods of DaemonSet are not scheduled or are not ready.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 77. 9030 Kubernetes node not ready
Alarm ID: 9030
Alarm name: Kubernetes Node Not Ready
Description: The alarm is raised when the Kubernetes node is not ready.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 78. 9031 Kubernetes node out of memory
Alarm ID: 9031
Alarm name: Kubernetes Node Out Of Memory
Description: The alarm is raised when the Kubernetes node has high memory utilization (> 90%).
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 79. 9032 Kubernetes node high CPU load
Alarm ID: 9032
Alarm name: Kubernetes Node High CPU Load
Description: The alarm is raised when Kubernetes node has a high CPU load (> 80%)
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the pod and Kubernetes to check the reason for its state.
Table 80. 9033 Kubernetes node CPU high I/O wait
Alarm ID: 9033
Alarm name: Kubernetes Node Cpu High IO wait
Description: The alarm is raised when the Kubernetes node has a high CPU I/O wait (> 10%).
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 81. 9034 Kubernetes node out of disk space
Alarm ID: 9034
Alarm name: Kubernetes Node Out Of Disk Space
Description: The alarm is raised when the Kubernetes node is out of disk space.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 82. 9035 Kubernetes node clock not synchronizing
Alarm ID: 9035
Alarm name: Kubernetes Node Clock Not Synchronizing
Description: The alarm is raised when the clock on the Kubernetes node is not synchronizing. Ensure that NTP is configured on this host.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state and make sure NTP is configured, enabled and working.
Table 83. 9036 Kubernetes node out of capacity
Alarm ID: 9036
Alarm name: Kubernetes Node Out Of Capacity
Description: The alarm is raised when the Kubernetes node is out of capacity and cannot support more workloads.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 84. 9037 Kubernetes volume out of disk space
Alarm ID: 9037
Alarm name: Kubernetes Volume Out Of Disk Space
Description: The alarm is raised when the Kubernetes volume has a high usage (> 90%).
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the logs of the VM and Kubernetes to check the reason for its state.
Table 85. 9050 Ceph state unhealthy
Alarm ID: 9050
Alarm name: Ceph State Unhealthy
Description: The alarm is raised when the Ceph instance is in an unhealthy state
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph.
Table 86. 9051 Ceph monitor low space
Alarm ID: 9051
Alarm name: Ceph Monitor Low Space
Description: The alarm is raised when the Ceph monitor has low disk space.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph.
Table 87. 9052 Ceph OSD down
Alarm ID: 9052
Alarm name: Ceph OSD Down
Description: The alarm is raised when the Ceph OSD (Object Storage Daemon) is not in a healthy state.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph.
Table 88. 9053 Ceph OSD high latency
Alarm ID: 9053
Alarm name: Ceph OSD High Latency
Description: The alarm is raised when the Ceph OSD (Object Storage Daemon) has high latency.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph.
Table 89. 9054 Ceph OSD low space
Alarm ID: 9054
Alarm name: Ceph OSD Low Space
Description: The alarm is raised when the Ceph OSD (Object Storage Daemon) has low disk space.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph and add more storage if needed.
Table 90. 9055 Ceph PG down
Alarm ID: 9055
Alarm name: Ceph PG Down
Description: The alarm is raised when the Ceph PG (Placement Group) is down. A PG with fewer than the minimum replicas is marked as down.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph and check if all data is available.
Table 91. 9056 Ceph PG incomplete
Alarm ID: 9056
Alarm name: Ceph PG Incomplete
Description: The alarm is raised when the Ceph PG (Placement Group) is missing information about writes that might have occurred, or does not have any healthy copies.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph and check if all data is available.
Table 92. 9057 Ceph PG inconsistent
Alarm ID: 9057
Alarm name: Ceph PG Inconsistent
Description: The alarm is raised when the Ceph PG (Placement Group) has objects that have an incorrect size or are missing from one replica.
Severity: Warning
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph and check if all data is available.
Table 93. 9058 Ceph PG unavailable
Alarm ID: 9058
Alarm name: Ceph PG Unavailable
Description: The alarm is raised when the Ceph PG (Placement Group) is in an unavailable state.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of Ceph and check if all data is available.
Table 94. 9100 GeoRedundancy failed
Alarm ID: 9100
Alarm name: GeoRedundancy Failed
Description: The alarm is raised when the Geo Redundancy fails.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of remote and local cluster and take appropriate action.
Table 95. 9102 GeoRedundancy reconcile failed
Alarm ID: 9102
Alarm name: GeoRedundancy Reconcile Failed
Description: The alarm is raised when a Geo Redundancy Reconcile fails.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of remote and local cluster services and take appropriate action.
Table 96. 9103 GeoRedundancy audit failed
Alarm ID: 9103
Alarm name: GeoRedundancy Audit Failed
Description: The alarm is raised when a Geo Redundancy Audit fails.
Severity: Critical
Probable cause: SYSTEM WARNING
Remedial action: Inspect the state of remote and local cluster services and take appropriate action.
Table 97. 9200 Image download failed
Alarm ID: 9200
Alarm name: Image Download Failed
Description: The alarm is raised when image downloading fails.
Severity: Major
Probable cause: INCORRECT CONFIGURATION
Remedial action: Inspect and correct the URL or the credentials. Also check if there is any connectivity issue.