Facility alarms
Facility alarms overview
Facility alarms provide a useful tool for operators to easily track and display the basic status of their equipment facilities. Facility Alarm support is intended to cover a focused subset of router states that are likely to indicate service impacts (or imminent service impacts) related to the overall state of hardware assemblies (cards, fans, links, and so on).
In the CLI, for brevity, the keyword or command alarm is used for commands related to facility alarms. This chapter may occasionally use the term alarm as a short form for facility alarm.
The CLI display for show
routines allows the system operator to easily identify current facility alarm conditions and recently cleared facility alarms without searching event logs or monitoring various card and port show commands to determine the health of basic equipment in the system such as cards and ports.
The SR OS alarm model is based on RFC 3877, Alarm Management Information Base (MIB), (which evolved from the IETF Disman drafts).
Facility alarms versus log events
Facility Alarms are different from log events. Facility alarms have a state (at least two states: active and clear) and a duration, and can be modeled with state transition events (raised, cleared). A log event occurs when the state of some object in the system changes. Log events notify the operator of a state change (for example, a port going down, an IGP peering session coming up, and so on). Facility alarms show the list of hardware objects that are currently in a bad state. Facility alarms can be examined at any time by an operator, whereas log events can be sent by a router asynchronously when they occur (for example, as an SNMP notification or trap, or a syslog event).
While log events provide notifications about a large number of different types of state changes in SR OS, facility alarms are intended to cover a focused subset of router states that are likely to indicate service impacts (or imminent service impacts) related to the overall state of hardware assemblies (cards, fans, links, and so on).
The facility alarm module processes log events to generate the raised and cleared state for the facility alarms. If a raising log event is suppressed under event-control, then the associated facility alarm is not raised. If a clearing log event is suppressed under event-control, then it is still processed for the purpose of clearing the associated facility alarm. If a log event is a raising event for a Facility Alarm, and the associated Facility Alarm is raised, then changing the log event to suppress clears the associated Facility Alarm.
Log event filtering, throttling and discarding of log events during overload do not affect facility alarm processing. In all cases, non-suppressed log events are processed by the facility alarm module before they are discarded.
Log events, facility alarms and LEDs illustrates the relationship of log events, facility alarms and the LEDs.
Facility alarms are different and have independent functionality from other uses of the term alarm in SR OS such as:
- log events that use the term alarm (tmnxEqPortSonetAlarm)
-
alarms configuration in the following contexts.
configure card fp hi-bw-mcast-src alarm configure multicast-management multicast-info-policy bundle channel source-override video analyzer alarms configure port ethernet report-alarm configure system thresholds rmon alarm configure system security cpu-protection policy alarm
- memory-use alarms:
- MD-CLI
configure system thresholds kb-memory-use-alarm
- classic
CLI
configure system thresholds memory-use-alarm
- MD-CLI
Facility alarm severities and alarm LED behavior
The alarm LEDs on the CPM/CCM reflects the current status of the facility alarms:
The critical alarm LED is lit if there is 1 or more active critical facility alarms
Similarly with the Major and Minor alarm LEDs
The OT alarm LED is not controlled by the facility alarm module
The supported alarm severities are as follows:
Critical (with an associated LED on the CPM/CCM)
Major (with an associated LED on the CPM/CCM)
Minor (with an associated LED on the CPM/CCM)
Warning (no LED)
Facility alarms inherit their severity from the raising log event.
A raising log event for a facility alarm configured with a severity of indeterminate or cleared results in the facility alarm not being raised. But, a clearing log event is processed to clear facility alarms, regardless of the severity of the clearing log event.
Changing the severity of a raising log event only affects subsequent occurrences of that log event and facility alarms. Facility alarms that are already raised when their raising log event severity is changed maintain their original severity.
Facility alarm hierarchy
Facility alarms for children objects is not raised for failure of a parent object. For example, when an MDA or XMA fails (or is shutdown) there is not a set of port facility alarms raised.
When a parent facility alarm is cleared, children facility alarms that are still in occurrence on the node appears in the active facility alarms list. For example, when a port fails there is a port facility alarm, but if the MDA or XMA is later shutdown the port alarm is cleared (and a card alarm is active for the MDA or XMA). If the MDA or XMA comes back into service, and the port is still down, then a port alarm becomes active again.
The supported facility alarm hierarchy is as follows (parent objects that are down cause alarms in all children to be masked):
CPM -> Compact Flash
CCM -> Compact Flash
IOM/IMM -> MDA -> Port -> Channel
XCM -> XMA -> Port
Facility alarm list
Facility alarm, facility alarm name, raising log event, sample details string and clearing log event and Facility alarm name/raising log event, cause, effect and recovery show the supported facility alarms.
Facility alarm | Facility alarm name/raising log event | Sample details string | Clearing log event |
---|---|---|---|
295-2430-1 |
tmnxPowerSupplyFanFailed |
Chassis 1 Power Shelf 1 Power Module 3 fan failed |
tmnxPowerSupplyFanFailedClear |
59-2004-1 |
linkDown |
Interface intf-towards-node-B22 is not operational |
linkUp |
64-2091-1 |
tmnxSysLicenseInvalid |
Error - <reason> record. <hw> will reboot the chassis <timeRemaining> |
tmnxSysLicenseValid |
64-2092-1 |
tmnxSysLicenseExpiresSoon |
The license installed on <hw> expires <timeRemaining> |
tmnxSysLicenseValid |
64-2221-1 |
tmnxSysStandbyLicensingError |
CPM B is not licensed; license record not found |
tmnxSysStandbyLicensingReady |
93-2006-1 |
tmnxSatSyncIfTimHoldover |
Synchronous timing interface on satellite esat-1 is in holdover state |
tmnxSatSyncIfTimHoldoverClear |
93-2008-1 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)' |
Synchronous timing interface on satellite, alarm on reference 1 |
tmnxSatSyncIfTimRef1AlarmClear |
93-2008-2 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)' |
Synchronous timing interface on satellite, alarm on reference 1 |
same as 93-2008-1 |
93-2008-3 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)' |
Synchronous timing interface on satellite, alarm on reference 1 |
same as 93-2008-1 |
93-2010-x |
same as 93-2008-x but for ref2 |
same as 93-2008-x but for ref2 |
same as 93-2008-x but for ref2 |
7-2001-1 |
tmnxEqCardFailure |
Class MDA Module: failed, reason: Mda 1 failed startup tests |
tmnxChassisNotificationClear |
7-2003-1 |
tmnxEqCardRemoved |
Class CPM Module: removed |
tmnxEqCardInserted |
7-2004-1 |
tmnxEqWrongCard |
Class IOM Module: wrong type inserted |
tmnxChassisNotificationClear |
7-2005-1 |
tmnxEnvTempTooHigh |
Chassis 1: temperature too high |
tmnxChassisNotificationClear |
7-2011-1 |
tmnxEqPowerSupplyRemoved |
Power supply 1, power lost |
tmnxEqPowerSupplyInserted |
7-2017-1 |
tmnxEqSyncIfTimingHoldover |
Synchronous Timing interface in holdover state |
tmnxEqSyncIfTimingHoldoverClear |
7-2019-1 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)' |
Synchronous Timing interface, alarm los on reference 1 |
tmnxEqSyncIfTimingRef1AlarmClear |
7-2019-2 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)' |
Synchronous Timing interface, alarm oof on reference 1 |
same as 7-2019-1 |
7-2019-3 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)' |
Synchronous Timing interface, alarm oopir on reference 1 |
same as 7-2019-1 |
7-2021-x |
same as 7-2019-x but for ref2 |
same as 7-2019-x but for ref2 |
same as 7-2019-x but for ref2 |
7-2030-x |
same as 7-2019-x but for the BITS input |
same as 7-2019-x but for the BITS input |
same as 7-2019-x but for the BITS input |
7-2033-1 |
tmnxChassisUpgradeInProgress |
Class CPM Module: software upgrade in progress |
tmnxChassisUpgradeComplete |
7-2073-x |
same as 7-2019-x but for the BITS2 input |
same as 7-2019-x but for the BITS2 input |
same as 7-2019-x but for the BITS2 input |
7-2092-1 |
tmnxEqPowerCapacityExceeded |
The system has reached maximum power capacity <x> watts |
tmnxEqPowerCapacityExceededClear |
7-2094-1 |
tmnxEqPowerLostCapacity |
The system can no longer support configured devices. Power capacity dropped to <x> watts |
tmnxEqPowerLostCapacityClear |
7-2096-1 |
tmnxEqPowerOverloadState |
The system has reached critical power capacity. Increase available power now |
tmnxEqPowerOverloadStateClear |
7-2104-1 |
tmnxEqLowSwitchFabricCap |
The switch fabric capacity is less than the forwarding capacity of IOM 1 because of errors in fabric links |
tmnxEqLowSwitchFabricCapClear |
7-2134-1 |
tmnxSyncIfTimBITS2048khzUnsup |
The revision of 1/1 does not meet the specifications to support the 2048kHz BITS interface type |
tmnxSyncIfTimBITS2048khzUnsupClr |
7-2136-1 |
tmnxEqMgmtEthRedStandbyRaise |
The standby CPM's management Ethernet port A/1 is serving as the system's management Ethernet port |
tmnxEqMgmtEthRedStandbyClear |
7-2138-1 |
tmnxEqPhysChassPowerSupOvrTmp |
Power supply 2 over temperature |
tmnxEqPhysChassPowerSupOvrTmpClr |
7-2140-1 |
tmnxEqPhysChassPowerSupAcFail |
Power supply 1 AC failure |
tmnxEqPhysChassPowerSupAcFailClr |
7-2142-1 |
tmnxEqPhysChassPowerSupDcFail |
Power supply 2 DC failure |
tmnxEqPhysChassPowerSupDcFailClr |
7-2144-1 |
tmnxEqPhysChassPowerSupInFail |
Power supply 1 input failure |
tmnxEqPhysChassPowerSupInFailClr |
7-2146-1 |
tmnxEqPhysChassPowerSupOutFail |
Power supply 1 output failure |
tmnxEqPhysChassPowerSupOutFailClr |
7-2148-1 |
tmnxEqPhysChassisFanFailure |
Fan 2 failed |
tmnxEqPhysChassisFanFailureClear |
7-2153-1 |
tmnxCpmMemSizeMismatch |
The standby CPM A has a different memory size than the active B |
tmnxCpmMemSizeMismatchClear |
7-2156-1 |
tmnxPhysChassPwrSupWrgFanDir |
The front to back fan direction for chassis 1 power supply 1 is not supported |
tmnxPhysChassPwrSupWrgFanDirClr |
7-2157-1 |
tmnxPhysChassPwrSupPemACRect |
Chassis 1 power supply 1 acRec1 failed or missing |
tmnxPhysChassPwrSupPemACRectClr |
7-2159-1 |
tmnxPhysChassPwrSupInputFeed |
Chassis 1 power supply 1 inputFeedA not supplying power |
tmnxPhysChassPwrSupInputFeedClr |
7-2161-1 |
tmnxEqBpEpromFail |
The active CPM is no longer able to access any of backplane EPROMs because of a hardware defect |
tmnxEqBpEpromFailClear |
7-2163-1 |
tmnxEqBpEpromWarning |
The active CPM is no longer to access one backplane EPROM because of a hardware defect but a redundant EPROM is present and accessible. |
tmnxEqBpEpromWarningClear |
7-2165-1 |
tmnxPhysChassisPCMInputFeed |
Chassis 1 pcm 1 1 not supplying power |
tmnxPhysChassisPCMInputFeedClr |
7-2190-1 |
tmnxPhysChassisPMOutFail |
Chassis 1 Power Shelf 1 Power Module 4 output failure |
tmnxPhysChassisPMOutFailClr |
7-2192-1 |
tmnxPhysChassisPMInputFeed |
Chassis 1 Power Shelf 1 Power Module 3 inputFeedA inputFeedB not supplying power |
tmnxPhysChassisPMInputFeedClr |
7-2194-1 |
tmnxPhysChassisFilterDoorOpen |
Filter door is missing or open |
tmnxPhysChassisFilterDoorClosed |
7-2196-1 |
tmnxPhysChassisPMOverTemp |
Chassis 1 Power Shelf 1 over temperature |
tmnxPhysChassisPMOverTempClr |
7-2203-x |
same as 7-2019-x but for SyncE |
same as 7-2019-x but for SyncE |
same as 7-2019-x but for SyncE |
7-2205-x |
same as 7-2019-x but for E2 |
same as 7-2019-x but for E2 |
same as 7-2019-x but for E2 |
7-4001-1 |
tmnxInterChassisCommsDown |
Control communications disrupted between the Active CPM and the chassis |
tmnxInterChassisCommsUp |
7-4003-1 |
tmnxCpmIcPortDown |
CPM Interconnect Port is not operational. Error code = invalid-connection |
tmnxCpmIcPortUp |
7-4006-1 |
tmnxCpmIcPortSFFRemoved |
CPM interconnect port SFF removed |
tmnxCpmIcPortSFFInserted |
7-4007-1 |
tmnxCpmNoLocalIcPort |
CPM A cannot reach the chassis using its local CPM interconnect ports |
tmnxCpmLocalIcPortAvail |
7-4017-1 |
tmnxSfmIcPortDown |
SFM interconnect Port is not operational. Error code = invalid-connection to Fabric 10 IcPort 2 |
tmnxSfmIcPortUp |
7-6002-1 |
tmnxPowerShelfCommsDown |
Chassis 1 Power Shelf 1 lost communication with cpmA |
tmnxPowerShelfCommsUp |
7-6005-1 |
tmnxPowerShelfOutputStatusDown |
Chassis 1 Power Shelf 2 output status switched to off |
tmnxPowerShelfOutputStatusUp |
Facility alarm | Facility alarm name/raising log event | Cause | Effect | Recovery |
---|---|---|---|---|
295-2430-1 |
tmnxPowerSupplyFanFailed |
The tmnxPowerSupplyFanFailed notification is generated when a fan within a particular power-supply has ceased to function normally. |
Cooling to the power-supply may be reduced, potentially leading to overheating. |
The power-supply should be replaced by one with fully-functioning fan elements. |
59-2004-1 |
linkDown |
A linkDown trap signifies that the SNMP entity, acting in an agent role, has detected that the ifOperStatus object for one of its communication links is about to enter the down state from some other state (but not from the notPresent state). |
The indicated interface is taken down. |
If the ifAdminStatus is down then the interface state is deliberate and there is no recovery. If the ifAdminStatus is up then try to determine that cause of the interface going down: cable cut, distal end went down, and so on. |
64-2091-1 |
tmnxSysLicenseInvalid |
Generated when the license becomes invalid for the reason specified in the log event/alarm. |
The system reboots at the end of the time remaining. |
Configure a valid license file location and filename. |
64-2092-1 |
tmnxSysLicenseExpiresSoon |
Generated when the license expires soon. |
The system reboots at the end of the time remaining. |
Configure a valid license file location and filename. |
64-2221-1 |
tmnxSysStandbyLicensingError |
Generated when the standby detects a licensing failure. The reason is specified in tmnxSysLicenseErrorReason. |
The standby CPM may not synchronized and may be put into a failed state. |
Configure a valid license file location and filename, given the value of tmnxSysLicenseErrorReason. |
93-2006-1 |
tmnxSatSyncIfTimHoldover |
The tmnxSatSyncIfTimHoldover notification is generated when the synchronous equipment timing subsystem of the satellite transitions into a holdover state |
The transmit timing of all synchronous interfaces on the satellite are no longer synchronous with the host. This could result in traffic loss. |
Investigate the state of the two input timing references on the satellite and the links between the host and the satellite (the uplinks that drive them for failures). |
93-2008-1 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)' |
The tmnxSatSyncIfTimRef1Alarm notification is generated when an alarm condition on the first timing reference is detected. |
If the other timing reference is free of faults, the satellite no longer has a backup timing reference. If the other timing reference also has a fault, the satellite is likely no longer synchronous with the host. |
Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults. |
93-2008-2 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)' |
The same cause as 93-2008-1 |
The same effect as 93-2008-1 |
Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults. |
93-2008-3 |
tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)' |
The same cause as 93-2008-1 |
The same effect as 93-2008-1 |
Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults. |
93-2010-x |
same as 93-2008-x but for ref2 |
The same cause as 93-2008-x but for ref2 |
The same as 93-2008-x but for ref2 |
The same as 93-2008-x but for ref2 |
7-2001-1 |
tmnxEqCardFailure |
Generated when one of the cards in a chassis has failed. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on. The reason is indicated in the details of the log event or alarm, and also available in the tmnxChassisNotifyCardFailureReason attribute included in the SNMP notification. |
The effect is dependent on the card that has failed. IOM (or XCM) or MDA (or XMA) failure causes a loss of service for all services running on that card. A fabric failure can impact traffic to and from all cards. 7750 SR, 7450 ESS — If the IOM/IMM fails then the two associated MDAs for the slot also go down. 7950 XRS — If one out of two XMAs fails in a XCM slot then the XCM remains up. If only one remaining operational XMA within a XCM slot fails, then the XCM goes into a booting operational state. |
Before taking any recovery steps collect a tech-support file, then try resetting (clear) the card. If unsuccessful, try removing and re-inserting the card. If that does not work then replace the card. |
7-2003-1 |
tmnxEqCardRemoved |
Generated when a card is removed from the chassis. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on. |
The effect is dependent on the card that has been removed. IOM (or XCM) or MDA (or XMA) removal causes a loss of service for all services running on that card. A fabric removal can impact traffic to and from all cards. |
Before taking any recovery steps collect a tech-support file, then try re-inserting the card. If unsuccessful, replace the card. |
7-2004-1 |
tmnxEqWrongCard |
Generated when the wrong type of card is inserted into a slot of the chassis. Even though a card may be physically supported by the slot, it may have been administratively configured to allow only specific card types in a particular slot location. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on. |
The effect is dependent on the card that has been incorrectly inserted. Incorrect IOM (or XCM) or MDA (or XMA) insertion causes a loss of service for all services running on that card. |
Insert the correct card into the correct slot, and ensure the slot is configured for the correct type of card. |
7-2005-1 |
tmnxEnvTempTooHigh |
Generated when the temperature sensor reading on an equipment object is greater than its configured threshold. |
This could be causing intermittent errors and could also cause permanent damage to components. |
Remove or power off the affected cards, or improve the cooling to the node. More powerful fan trays may also be required. |
7-2011-1 |
tmnxEqPowerSupplyRemoved |
Generated when:
|
Reduced power can cause intermittent errors and could also cause permanent damage to components. |
Re-insert the power supply or raise the input voltage to above -42.5 VDC. |
7-2017-1 |
tmnxEqSyncIfTimingHoldover |
Generated when the synchronous equipment timing subsystem transitions into a holdover state. |
Any node-timed ports have very slow frequency drift limited by the central clock oscillator stability. The oscillator meets the holdover requirements of a Stratum 3 and G.813 Option 1 clock. |
Address issues with the central clock input references. |
7-2019-1 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)' |
Generated when an alarm condition on the first timing reference is detected. The type of alarm (los, oof, and so on) is indicated in the details of the log event or alarm, and is also available in the tmnxSyncIfTimingNotifyAlarm attribute included in the SNMP notification. The SNMP notification has the same indexes as those of the tmnxCpmCardTable. |
Timing reference 1 cannot be used as a source of timing into the central clock. |
Address issues with the signal associated with timing reference 1. |
7-2019-2 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)' |
The same cause as 7-2019-1 |
The same effect as 7-2019-1 |
Address issues with the signal associated with timing reference 1. |
7-2019-3 |
tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)' |
The same cause as 7-2019-1 |
The same effect as 7-2019-1 |
Address issues with the signal associated with timing reference 1. |
7-2021-x |
same as 7-2019-x but for ref2 |
The same cause as 7-2019-x but for the second timing reference |
The same as 7-2019-x but for the second timing reference |
The same as 7-2019-x but for the second timing reference |
7-2030-x |
same as 7-2019-x but for the BITS input |
The same cause as 7-2019-x but for the BITS timing reference |
The same as 7-2019-x but for the BITS timing reference |
The same as 7-2019-x but for the BITS timing reference |
7-2033-1 |
tmnxChassisUpgradeInProgress |
The tmnxChassisUp gradeInProgress notification is generated only after a CPM switchover occurs and the new active CPM is running new software, while the IOMs or XCMs are still running old software. This is the start of the upgrade process. The tmnxChassisUpgradeInProgress notification continues to be generated every 30 minutes while at least one IOM or XCM is still running older software. |
A software mismatch between the CPM and IOM or XCM is generally fine for a short duration (during an upgrade) but may not allow for correct long term operation. |
Complete the upgrade of all IOMs or XCMs. |
7-2073-x |
same as 7-2019-x but for the BITS2 input |
The same as 7-2019-x but for the BITS 2 timing reference |
The same as 7-2019-x but for the BITS 2 timing reference |
The same as 7-2019-x but for the BITS 2 timing reference |
7-2092-1 |
tmnxEqPowerCapacityExceeded |
Generated when a device needs power to boot, but there is not enough power capacity to support the device. |
A non-powered device does not boot until the power capacity is increased to support the device. |
Add a new power supply to the system, or change the faulty power supply with a working one. |
7-2094-1 |
tmnxEqPowerLostCapacity |
Generated when a power supply fails or is removed which puts the system in an overloaded situation. |
Devices are powered off in order of lowest power priority until the available power capacity can support the powered devices. |
Add a new power supply to the system, or change the faulty power supply with a working one. |
7-2096-1 |
tmnxEqPowerOverloadState |
Generated when the overloaded power capacity cannot support the power requirements and there are no further devices that can be powered off. |
The system runs a risk of experiencing brownouts while the available power capacity does not meet the required power consumption. |
Add power capacity or manually shutdown devices until the power capacity meets the power needs. |
7-2104-1 |
tmnxEqLowSwitchFabricCap |
The tmnxEqLowSwitchFabricCap alarm is generated when the total switch fabric capacity becomes less than the IOM capacity because of link failures. At least one of the taps on the IOM is below 100% capacity. |
There is diminished switch fabric capacity to forward service-impacting information. |
If the system does not self-recover, the IOM must be rebooted. |
7-2134-1 |
tmnxSyncIfTimBITS2048khzUnsup |
The tmnxSyncIfTimBITS2048khzUnsup notification is generated when the value of tSyncIfTimingAdmBITSIfType is set to 'g703-2048khz (5)' and the CPM does not meet the specifications for the 2048kHz BITS output signal under G.703. |
The BITS input is not used as the Sync reference and the 2048kHz BITS output signal generated by the CPM is squelched. |
Replace the CPM with one that is capable of generating the 2048kHz BITS output signal, or set tSyncIfTimingAdmBITSIfType to a value other than 'g703-2048khz (5)'. |
7-2136-1 |
tmnxEqMgmtEthRedStandbyRaise |
The tmnxEqMgmtEthRedStandbyRaise notification is generated when the active CPM's management Ethernet port goes operationally down and the standby CPM's management Ethernet port is operationally up and now serving as the system's management Ethernet port. |
The management Ethernet port is no longer redundant. The node can be managed via the standby CPM's management Ethernet port only. |
Bring the active CPM's management Ethernet port operationally up. |
7-2138-1 |
tmnxEqPhysChassPowerSupOvrTmp |
Generated when the temperature sensor reading on a power supply module is greater than its configured threshold. |
This could be causing intermittent errors and could also cause permanent damage to components. |
Remove or power off the affected power supply module or improve the cooling to the node. More powerful fan trays may also be required. The power supply itself may be faulty so replacement may be necessary. |
7-2140-1 |
tmnxEqPhysChassPowerSupAcFail |
Generated when an AC failure is detected on a power supply. |
Reduced power can cause intermittent errors and could also cause permanent damage to components. |
First try re-inserting the power supply. If unsuccessful, replace the power supply. |
7-2142-1 |
tmnxEqPhysChassPowerSupDcFail |
Generated when an DC failure is detected on a power supply. |
Reduced power can cause intermittent errors and could also cause permanent damage to components. |
First try re-inserting the power supply. If unsuccessful, then replace the power supply. |
7-2144-1 |
tmnxEqPhysChassPowerSupInFail |
Generated when an input failure is detected on a power supply. |
Reduced power can cause intermittent errors and could also cause permanent damage to components. |
First try re-inserting the power supply. If that does not work, then replace the power supply. |
7-2146-1 |
tmnxEqPhysChassPowerSupOutFail, |
Generated when an output failure is detected on a power supply. |
Reduced power can cause intermittent errors and could also cause permanent damage to components. |
First try re-inserting the power supply. If that does not work, then replace the power supply. |
7-2148-1 |
tmnxEqPhysChassisFanFailure |
Generated when one of the fans in a fan tray has failed. |
This could cause the temperature to rise and result in intermittent errors and potentially permanent damage to components. |
Replace the fan tray immediately, improve the cooling to the node, or reduce the heat being generated in the node by removing cards or powering down the node. |
7-2153-1 |
tmnxCpmMemSizeMismatch |
A tmnxCpmMemSizeMismatch notification is generated when the RAM memory size of the standby CPM (that is, tmnxChassisNotifyCpmCardSlotNum) is different from the active CPM (that is, tmnxChassisNotifyHwIndex). |
There is an increased risk of the memory overflow on the standby CPM during the CPM switchover. |
Use CPMs with the same memory size. |
7-2156-1 |
tmnxPhysChassPwrSupWrgFanDir |
The tmnxPhysChassPwrSupWrgFanDirClr notification is generated when the airflow direction of the power supply's fan is corrected. |
The fan is cooling the power supply in the correct direction. |
No recovery required. |
7-2157-1 |
tmnxPhysChassPwrSupPemACRect |
The tmnxPhysChassPwrSupPemACRect notification is generated if any one of the AC rectifiers for a power supply is in a failed state or is missing. |
There is an increased risk of the power supply failing, causing insufficient power to the system. |
Bring the AC rectifiers back online. |
7-2159-1 |
tmnxPhysChassPwrSupInputFeed |
The tmnxPhysChassPwrSupInputFeed notification is generated if any one of the input feeds for a power supply is not supplying power. |
There is an increased risk of system power brown-outs or black-outs. |
Restore all of the input feeds that are not supplying power. |
7-2161-1 |
tmnxEqBpEpromFail |
The tmnxEqBpEpromFail alarm is generated when the active CPM is no longer able to access any of backplane EPROMs because of a hardware defect. |
The active CPM is at risk of failing to initialize after node reboot because of not being able to access the BP EPROM to read the chassis type. |
The system does not self-recover and Nokia Support has to be contacted for further instructions. |
7-2163-1 |
tmnxEqBpEpromWarning |
The tmnxEqBpEpromWarning alarm is generated when the active CPM is no longer to access one backplane EPROM because of a hardware defect but a redundant EPROM is present and accessible. |
There is no effect on system operation. |
No recovery action required. |
7-2165-1 |
tmnxPhysChassisPCMInputFeed |
The tmnxPhysChassisPCMInputFeed notification is generated if any one of the input feeds for a PCM has gone offline. |
There is an increased risk of system power brown-outs or black-outs. |
Restore all of the input feeds that are not supplying power. |
7-2190-1 |
tmnxPhysChassisPMOutFail |
The tmnxPhysChassisPMOutFail notification is generated when an output failure occurs on the power module. |
The power module is no longer operational. |
Insert a new power module. |
7-2192-1 |
tmnxPhysChassisPMInputFeed |
The tmnxPhysChassisPMInputFeed notification is generated if any one of the input feeds for a power module is not supplying power. |
There is an increased risk of system power brownouts or blackouts. |
Restore all of the input feeds that are not supplying power. |
7-2194-1 |
tmnxPhysChassisFilterDoorOpen |
The tmnxPhysChassisFilterDoorOpen notification is generated when the filter door is either open or not present. |
Power shelf protection may be compromised. |
If the filter door is not installed, install it. Close the filter door. |
7-2196-1 |
tmnxPhysChassisPMOverTemp |
The tmnxPhysChassisPMOverTemp notification is generated when a power module's temperature surpasses the temperature threshold. |
The power module is no longer operational. |
Check input feed or insert a new power module. |
7-2203-x |
same as 7-2019-x but for SyncE |
The same cause as 7-2019-x but for SyncE |
same as 7-2019-x but for SyncE |
same as 7-2019-x but for SyncE |
7-2205-x |
same as 7-2019-x but for E2 |
The same cause as 7-2019-x but for E2 |
same as 7-2019-x but for E2 |
same as 7-2019-x but for E2 |
7-4001-1 |
tmnxInterChassisCommsDown |
The tmnxInterChassis CommsDown alarm is generated when the active CPM cannot reach the far-end chassis. |
The resources on the far-end chassis are not available. This event for the far-end chassis means that the CPM, SFM, and XCM cards in the far-end chassis reboot and remain operationally down until communications are re-established. |
Ensure that all CPM interconnect ports in the system are properly cabled together with working cables. |
7-4003-1 |
tmnxCpmIcPortDown |
The tmnxCpmIcPort Down alarm is generated when the CPM interconnect port is not operational. The reason may be a cable connected incorrectly, a disconnected cable, a faulty cable, or a misbehaving CPM interconnect port or card. |
At least one of the control plane paths used for inter-chassis CPM communication is not operational. Other paths may be available. |
A manual verification and testing of each CPM interconnect port is required to ensure fully functional operation. Physical replacement of cabling may be required. |
7-4006-1 |
tmnxCpmIcPortSFFRemoved |
The tmnxCpmIcPortSFFRemoved notification is generated when the SFF (eg. QSFP) is removed from the CPM interconnect port. Removing an SFF causes both this trap, and also a tmnxCpmIcPortDown event. |
Removing the SFF causes the CPM interconnect port to go down. This port is no longer able to be used as part of the control plane between chassis but other paths may be available. |
Insert a working SFF into the port. |
7-4007-1 |
tmnxCpmNoLocalIcPort |
The tmnxCpmNoLo calIcPort alarm is generated when the CPM cannot reach the other chassis using its local CPM interconnect ports. |
Another control communications path may still be available between the CPM and the other chassis via the mate CPM in the same chassis. If that alternative path is not available then complete disruption of control communications to the other chassis occurs and the tmnxInterChassisCommsDown alarm is raised. A tmnxCpmNoLocalIcPort alarm on the active CPM indicates that a further failure of the local CPM interconnect ports on the standby CPM causes complete disruption of control communications to the other chassis and the tmnxInterChassisCommsDown alarm is raised. A tmnxCpmNoLocalIcPort alarm on the standby CPM indicates that a CPM switchover may cause temporary disruption of control communications to the other chassis while the rebooting CPM comes back into service. |
Ensure that all CPM interconnect ports in the system are properly cabled together with working cables. |
7-4017-1 |
tmnxSfmIcPortDown |
The tmnxSfmIcPortDown alarm is generated when the SFM interconnect port is not operational. The reason may be a cable connected incorrectly, a disconnected cable, a faulty cable, or a misbehaving SFM interconnect port or SFM card. |
This port can no longer be used as part of the user plane fabric between chassis. Other fabric paths may be available resulting in no loss of capacity. |
A manual verification and testing of each SFM interconnect port is required to ensure fully functional operation. Physical replacement of cabling may be required. |
7-6002-1 |
tmnxPowerShelfCommsDown |
The tmnxPowerShelfCommsDown is generated when there is a loss of communications with the power shelf controller. |
If there is a power failure, it is not detected because the power modules cannot be polled. The system continues to report the state of the power modules as they were when last seen. |
Correct the power shelf controller communications problem. |
7-6005-1 |
tmnxPowerShelfOutputStatusDown |
The tmnxPowerShelfOutputStatusSwitch is generated when the physical output switch on the power shelf is set to Standby. |
The power output from the identified power shelf is switched off and does not supply power to the system. |
Set output switch to On to restore power output. |
ESA facility alarm, facility alarm name, raising log event, sample details string, and clearing log event and ESA facility alarm name/raising log event, cause, effect, and recovery show the supported ESA facility alarms.
ESA Event and states | Facility Alarm | Severity | Facility alarm name/raising log event | Sample details string | Clearing log event |
---|---|---|---|---|---|
ESA HW Status: degraded | 2400-1 | Major | tmnxEsaHwStatusDegraded | ESA 3 aggregate hardware status degraded | tmnxEsaHwStatusDegradedClr |
ESA HW Status: critical | 2402-1 | Critical | tmnxEsaHwStatusCritical | ESA 3 aggregate hardware status critical | tmnxEsaHwStatusCriticalClr |
ESA HW Power Supply 1 Degraded | 2404-1 | Critical | tmnxEsaHwPwrSup1Degraded | ESA 3 power supply 1 status degraded | tmnxEsaHwPwrSup1DegradedClr |
ESA HW Power Supply 1 Failed | 2406-1 | Critical | tmnxEsaHwPwrSup1Failed | ESA 3 power supply 1 status failed | tmnxEsaHwPwrSup1FailedClr |
ESA HW Power Supply 2 Degraded | 2408-1 | Critical | tmnxEsaHwPwrSup2Degraded | ESA 3 power supply 2 status degraded | tmnxEsaHwPwrSup2DegradedClr |
ESA HW Power Supply 2 Failed | 2410-1 | Critical | tmnxEsaHwPwrSup2Failed | ESA 3 power supply 2 status failed | tmnxEsaHwPwrSup2FailedClr |
ESA HW Fan Bank Non Redundant | 2412-1 | Major | tmnxEsaHwFanBankNonRedun | ESA 3 fan bank redundancy degraded | tmnxEsaHwFanBankNonRedunClr |
ESA HW Fan Bank Failed Redundancy | 2414-1 | Critical | tmnxEsaHwFanBankFailRedun | ESA 3 fan bank redundancy failed | tmnxEsaHwFanBankFailRedunClr |
ESA HW Fan Status Degraded | 2416-1 | Critical | tmnxEsaHwFanStatusDegraded | ESA 3 fan status degraded | tmnxEsaHwFanStatusDegradedClr |
ESA HW Fan Status Failed | 2418-1 | Critical | tmnxEsaHwFanStatusFailed | ESA 3 fan status failed | tmnxEsaHwFanStatusFailedClr |
ESA HW Power Supply Mismatch | 2420-1 | Major | tmnxEsaHwPwrSupMismatch | ESA 3 power supply mismatch | tmnxEsaHwPwrSupMismatchClr |
ESA HW Power Supply Bank Non Redundant | 2422-1 | Major | tmnxEsaHwPwrSupBankNonRedun | ESA 3 power supply bank redundancy degraded | tmnxEsaHwPwrSupBankNonRedunClr |
ESA HW Temperature Degraded | 2426-1 | Critical | tmnxEsaHwTemperatureDegraded | ESA 3 temperature status degraded | tmnxEsaHwTemperatureDegradedClr |
ESA HW Temperature Failed | 2428-1 | Critical | tmnxEsaHwTemperatureFailed | ESA 3 temperature status failed | tmnxEsaHwTemperatureFailedClr |
Facility Alarm | Facility alarm name/raising log event | Cause | Effect | Recovery |
---|---|---|---|---|
2400-1 | tmnxEsaHwStatusDegraded | Generated when the ESA hardware status is degraded | Service may be affected | Contact Nokia customer support |
2402-1 | tmnxEsaHwStatusCritical | Generated when one or more ESA hardware statuses are critical | Service may be affected | Contact Nokia customer support |
2404-1 | tmnxEsaHwPwrSup1Degraded | Generated when the ESA power supply 1 is degraded | Power supply and redundancy are affected. ESA operation may be affected. | Contact Nokia customer support |
2406-1 | tmnxEsaHwPwrSup1Failed | Generated when the ESA power supply 1 fails | Power supply and redundancy are affected. ESA operation may be affected. | Contact Nokia customer support |
2408-1 | tmnxEsaHwPwrSup2Degraded | Generated when the ESA power supply 2 is degraded | Power supply operation and reliability may be affected | Contact Nokia customer support |
2410-1 | tmnxEsaHwPwrSup2Failed | Generated when the ESA power supply 2 fails | Power supply and redundancy are affected. ESA operation may be affected. | Contact Nokia customer support |
2412-1 | tmnxEsaHwFanBankNonRedun | Generated when 1 to 6 of the 7 fans are failed | ESA cooling may be inadequate | Contact Nokia customer support |
2414-1 | tmnxEsaHwFanBankFailRedun | Generated when all 7 fans are failed | ESA cooling is inadequate, the ESA may shut down | Contact Nokia customer support |
2416-1 | tmnxEsaHwFanStatusDegraded | Generated when one or more ESA fans are degraded | ESA cooling may be inadequate | Contact Nokia customer support |
2418-1 | tmnxEsaHwFanStatusFailed | Generated when one or more ESA fans fail | ESA cooling may be inadequate | Contact Nokia customer support |
2420-1 | tmnxEsaHwPwrSupMismatch | Generated when the ESA power supplies do not match | ESA power supplies must be matched | Equip the ESA with matching power supplies |
2422-1 | tmnxEsaHwPwrSupBankNonRedun | Generated when one ESA power supply has failed and the other is running | ESA will fail if another power supply failure occurs | Contact Nokia customer support |
2426-1 | tmnxEsaHwTemperatureDegraded | Generated when one or more ESA temperatures is outside the expected operating range | If the ESA temperature remains outside the expected operating range, the ESA may shut down | The ESA may need maintenance to rectify the issue. Contact Nokia customer support. |
2428-1 | tmnxEsaHwTemperatureFailed | Generated when the ESA temperature is critical | If the ESA temperature remains outside the expected operating range, the ESA may shut down | The ESA may need maintenance to rectify the issue. Contact Nokia customer support. |
The linkDown Facility Alarm is supported for the objects listed in linkDown Facility Alarm support (note that all objects may not be supported on all platforms):
Object | Supported |
---|---|
Ethernet Ports |
Yes |
Sonet Section, Line and Path (POS) |
Yes |
TDM Ports (E1, T1, DS3) including CES MDAs |
Yes |
TDM Channels (DS3 channel configured in an STM-1 port) |
Yes |
Ethernet LAGs |
No |
APS groups |
No |
Ethernet VLANs |
No |
Configuring facility alarms with CLI
This section provides information to configure facility alarms using the command line interface.
Enabling facility alarms
The following example shows how to enable facility alarms.
MD-CLI
[ex:/configure system alarms]
A:admin@node-2# info
admin-state enable
classic CLI
A:node-2>config>system# alarms
#------------------------------------------
no shutdown
exit
----------------------------------------------
Common configuration tasks
Configuring the maximum number of alarms to clear
You can configure the number of entries to keep in the list of cleared alarms. Use the following command to configure the maximum number of cleared alarms to keep.
configure system alarms max-cleared
MD-CLI
[ex:/configure system alarms]
A:admin@node-2# info
max-cleared 100
classic CLI
*A:node-2>config>system>alarms# max-cleared 100
A:node-2>config>system>alarms# info
----------------------------------------------
max-cleared 100
----------------------------------------------