Facility alarms

Facility alarms overview

Facility alarms provide a useful tool for operators to easily track and display the basic status of their equipment facilities. Facility Alarm support is intended to cover a focused subset of router states that are likely to indicate service impacts (or imminent service impacts) related to the overall state of hardware assemblies (cards, fans, links, and so on).

In the CLI, for brevity, the keyword or command alarm is used for commands related to facility alarms. This chapter may occasionally use the term alarm as a short form for facility alarm.

The CLI display for show routines allows the system operator to easily identify current facility alarm conditions and recently cleared facility alarms without searching event logs or monitoring various card and port show commands to determine the health of basic equipment in the system such as cards and ports.

The SR OS alarm model is based on RFC 3877, Alarm Management Information Base (MIB), (which evolved from the IETF Disman drafts).

Facility alarms versus log events

Facility Alarms are different from log events. Facility alarms have a state (at least two states: active and clear) and a duration, and can be modeled with state transition events (raised, cleared). A log event occurs when the state of some object in the system changes. Log events notify the operator of a state change (for example, a port going down, an IGP peering session coming up, and so on). Facility alarms show the list of hardware objects that are currently in a bad state. Facility alarms can be examined at any time by an operator, whereas log events can be sent by a router asynchronously when they occur (for example, as an SNMP notification or trap, or a syslog event).

While log events provide notifications about a large number of different types of state changes in SR OS, facility alarms are intended to cover a focused subset of router states that are likely to indicate service impacts (or imminent service impacts) related to the overall state of hardware assemblies (cards, fans, links, and so on).

The facility alarm module processes log events to generate the raised and cleared state for the facility alarms. If a raising log event is suppressed under event-control, then the associated facility alarm is not raised. If a clearing log event is suppressed under event-control, then it is still processed for the purpose of clearing the associated facility alarm. If a log event is a raising event for a Facility Alarm, and the associated Facility Alarm is raised, then changing the log event to suppress clears the associated Facility Alarm.

Log event filtering, throttling and discarding of log events during overload do not affect facility alarm processing. In all cases, non-suppressed log events are processed by the facility alarm module before they are discarded.

Log events, facility alarms and LEDs illustrates the relationship of log events, facility alarms and the LEDs.

Figure 1. Log events, facility alarms and LEDs

Facility alarms are different and have independent functionality from other uses of the term alarm in SR OS such as:

  • log events that use the term alarm (tmnxEqPortSonetAlarm)
  • alarms configuration in the following contexts.

    configure card fp hi-bw-mcast-src alarm
    configure multicast-management multicast-info-policy bundle channel source-override video analyzer alarms
    configure port ethernet report-alarm
    configure system thresholds rmon alarm
    configure system security cpu-protection policy alarm
  • memory-use alarms:
    • MD-CLI
      configure system thresholds kb-memory-use-alarm
    • classic CLI
      configure system thresholds memory-use-alarm

Facility alarm severities and alarm LED behavior

The alarm LEDs on the CPM/CCM reflects the current status of the facility alarms:

  • The critical alarm LED is lit if there is 1 or more active critical facility alarms

  • Similarly with the Major and Minor alarm LEDs

  • The OT alarm LED is not controlled by the facility alarm module

The supported alarm severities are as follows:

  • Critical (with an associated LED on the CPM/CCM)

  • Major (with an associated LED on the CPM/CCM)

  • Minor (with an associated LED on the CPM/CCM)

  • Warning (no LED)

Facility alarms inherit their severity from the raising log event.

A raising log event for a facility alarm configured with a severity of indeterminate or cleared results in the facility alarm not being raised. But, a clearing log event is processed to clear facility alarms, regardless of the severity of the clearing log event.

Changing the severity of a raising log event only affects subsequent occurrences of that log event and facility alarms. Facility alarms that are already raised when their raising log event severity is changed maintain their original severity.

Facility alarm hierarchy

Facility alarms for children objects is not raised for failure of a parent object. For example, when an MDA or XMA fails (or is shutdown) there is not a set of port facility alarms raised.

When a parent facility alarm is cleared, children facility alarms that are still in occurrence on the node appears in the active facility alarms list. For example, when a port fails there is a port facility alarm, but if the MDA or XMA is later shutdown the port alarm is cleared (and a card alarm is active for the MDA or XMA). If the MDA or XMA comes back into service, and the port is still down, then a port alarm becomes active again.

The supported facility alarm hierarchy is as follows (parent objects that are down cause alarms in all children to be masked):

  • CPM -> Compact Flash

  • CCM -> Compact Flash

  • IOM/IMM -> MDA -> Port -> Channel

  • XCM -> XMA -> Port

Note: A masked facility alarm is not the same as a cleared facility alarm. The cleared facility alarm queue does not display entries for previously raised facility alarms that are currently masked. If the masking event goes away, then the previously raised facility alarms are visible again in the active facility alarm queue.

Facility alarm list

Facility alarm, facility alarm name, raising log event, sample details string and clearing log event and Facility alarm name/raising log event, cause, effect and recovery show the supported facility alarms.

Table 1. Facility alarm, facility alarm name, raising log event, sample details string and clearing log event
Facility alarm Facility alarm name/raising log event Sample details string Clearing log event

295-2430-1

tmnxPowerSupplyFanFailed

Chassis 1 Power Shelf 1 Power Module 3 fan failed

tmnxPowerSupplyFanFailedClear

59-2004-1

linkDown

Interface intf-towards-node-B22 is not operational

linkUp

64-2091-1

tmnxSysLicenseInvalid

Error - <reason> record. <hw> will reboot the chassis <timeRemaining>

tmnxSysLicenseValid

64-2092-1

tmnxSysLicenseExpiresSoon

The license installed on <hw> expires <timeRemaining>

tmnxSysLicenseValid

64-2221-1

tmnxSysStandbyLicensingError

CPM B is not licensed; license record not found

tmnxSysStandbyLicensingReady

93-2006-1

tmnxSatSyncIfTimHoldover

Synchronous timing interface on satellite esat-1 is in holdover state

tmnxSatSyncIfTimHoldoverClear

93-2008-1

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)'

Synchronous timing interface on satellite, alarm on reference 1

tmnxSatSyncIfTimRef1AlarmClear

93-2008-2

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)'

Synchronous timing interface on satellite, alarm on reference 1

same as 93-2008-1

93-2008-3

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)'

Synchronous timing interface on satellite, alarm on reference 1

same as 93-2008-1

93-2010-x

same as 93-2008-x but for ref2

same as 93-2008-x but for ref2

same as 93-2008-x but for ref2

7-2001-1

tmnxEqCardFailure

Class MDA Module: failed, reason: Mda 1 failed startup tests

tmnxChassisNotificationClear

7-2003-1

tmnxEqCardRemoved

Class CPM Module: removed

tmnxEqCardInserted

7-2004-1

tmnxEqWrongCard

Class IOM Module: wrong type inserted

tmnxChassisNotificationClear

7-2005-1

tmnxEnvTempTooHigh

Chassis 1: temperature too high

tmnxChassisNotificationClear

7-2011-1

tmnxEqPowerSupplyRemoved

Power supply 1, power lost

tmnxEqPowerSupplyInserted

7-2017-1

tmnxEqSyncIfTimingHoldover

Synchronous Timing interface in holdover state

tmnxEqSyncIfTimingHoldoverClear

7-2019-1

tmnxEqSyncIfTimingRef1Alarm

with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)'

Synchronous Timing interface, alarm los on reference 1

tmnxEqSyncIfTimingRef1AlarmClear

7-2019-2

tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)'

Synchronous Timing interface, alarm oof on reference 1

same as 7-2019-1

7-2019-3

tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)'

Synchronous Timing interface, alarm oopir on reference 1

same as 7-2019-1

7-2021-x

same as 7-2019-x but for ref2

same as 7-2019-x but for ref2

same as 7-2019-x but for ref2

7-2030-x

same as 7-2019-x but for the BITS input

same as 7-2019-x but for the BITS input

same as 7-2019-x but for the BITS input

7-2033-1

tmnxChassisUpgradeInProgress

Class CPM Module: software upgrade in progress

tmnxChassisUpgradeComplete

7-2073-x

same as 7-2019-x but for the BITS2 input

same as 7-2019-x but for the BITS2 input

same as 7-2019-x but for the BITS2 input

7-2092-1

tmnxEqPowerCapacityExceeded

The system has reached maximum power capacity <x> watts

tmnxEqPowerCapacityExceededClear

7-2094-1

tmnxEqPowerLostCapacity

The system can no longer support configured devices. Power capacity dropped to <x> watts

tmnxEqPowerLostCapacityClear

7-2096-1

tmnxEqPowerOverloadState

The system has reached critical power capacity. Increase available power now

tmnxEqPowerOverloadStateClear

7-2104-1

tmnxEqLowSwitchFabricCap

The switch fabric capacity is less than the forwarding capacity of IOM 1 because of errors in fabric links

tmnxEqLowSwitchFabricCapClear

7-2134-1

tmnxSyncIfTimBITS2048khzUnsup

The revision of 1/1 does not meet the specifications to support the 2048kHz BITS interface type

tmnxSyncIfTimBITS2048khzUnsupClr

7-2136-1

tmnxEqMgmtEthRedStandbyRaise

The standby CPM's management Ethernet port A/1 is serving as the system's management Ethernet port

tmnxEqMgmtEthRedStandbyClear

7-2138-1

tmnxEqPhysChassPowerSupOvrTmp

Power supply 2 over temperature

tmnxEqPhysChassPowerSupOvrTmpClr

7-2140-1

tmnxEqPhysChassPowerSupAcFail

Power supply 1 AC failure

tmnxEqPhysChassPowerSupAcFailClr

7-2142-1

tmnxEqPhysChassPowerSupDcFail

Power supply 2 DC failure

tmnxEqPhysChassPowerSupDcFailClr

7-2144-1

tmnxEqPhysChassPowerSupInFail

Power supply 1 input failure

tmnxEqPhysChassPowerSupInFailClr

7-2146-1

tmnxEqPhysChassPowerSupOutFail

Power supply 1 output failure

tmnxEqPhysChassPowerSupOutFailClr

7-2148-1

tmnxEqPhysChassisFanFailure

Fan 2 failed

tmnxEqPhysChassisFanFailureClear

7-2153-1

tmnxCpmMemSizeMismatch

The standby CPM A has a different memory size than the active B

tmnxCpmMemSizeMismatchClear

7-2156-1

tmnxPhysChassPwrSupWrgFanDir

The front to back fan direction for chassis 1 power supply 1 is not supported

tmnxPhysChassPwrSupWrgFanDirClr

7-2157-1

tmnxPhysChassPwrSupPemACRect

Chassis 1 power supply 1 acRec1 failed or missing

tmnxPhysChassPwrSupPemACRectClr

7-2159-1

tmnxPhysChassPwrSupInputFeed

Chassis 1 power supply 1 inputFeedA not supplying power

tmnxPhysChassPwrSupInputFeedClr

7-2161-1

tmnxEqBpEpromFail

The active CPM is no longer able to access any of backplane EPROMs because of a hardware defect

tmnxEqBpEpromFailClear

7-2163-1

tmnxEqBpEpromWarning

The active CPM is no longer to access one backplane EPROM because of a hardware defect but a redundant EPROM is present and accessible.

tmnxEqBpEpromWarningClear

7-2165-1

tmnxPhysChassisPCMInputFeed

Chassis 1 pcm 1 1 not supplying power

tmnxPhysChassisPCMInputFeedClr

7-2190-1

tmnxPhysChassisPMOutFail

Chassis 1 Power Shelf 1 Power Module 4 output failure

tmnxPhysChassisPMOutFailClr

7-2192-1

tmnxPhysChassisPMInputFeed

Chassis 1 Power Shelf 1 Power Module 3 inputFeedA inputFeedB not supplying power

tmnxPhysChassisPMInputFeedClr

7-2194-1

tmnxPhysChassisFilterDoorOpen

Filter door is missing or open

tmnxPhysChassisFilterDoorClosed

7-2196-1

tmnxPhysChassisPMOverTemp

Chassis 1 Power Shelf 1 over temperature

tmnxPhysChassisPMOverTempClr

7-2203-x

same as 7-2019-x but for SyncE

same as 7-2019-x but for SyncE

same as 7-2019-x but for SyncE

7-2205-x

same as 7-2019-x but for E2

same as 7-2019-x but for E2

same as 7-2019-x but for E2

7-4001-1

tmnxInterChassisCommsDown

Control communications disrupted between the Active CPM and the chassis

tmnxInterChassisCommsUp

7-4003-1

tmnxCpmIcPortDown

CPM Interconnect Port is not operational. Error code = invalid-connection

tmnxCpmIcPortUp

7-4006-1

tmnxCpmIcPortSFFRemoved

CPM interconnect port SFF removed

tmnxCpmIcPortSFFInserted

7-4007-1

tmnxCpmNoLocalIcPort

CPM A cannot reach the chassis using its local CPM interconnect ports

tmnxCpmLocalIcPortAvail

7-4017-1

tmnxSfmIcPortDown

SFM interconnect Port is not operational. Error code = invalid-connection to Fabric 10 IcPort 2

tmnxSfmIcPortUp

7-6002-1

tmnxPowerShelfCommsDown

Chassis 1 Power Shelf 1 lost communication with cpmA

tmnxPowerShelfCommsUp

7-6005-1

tmnxPowerShelfOutputStatusDown

Chassis 1 Power Shelf 2 output status switched to off

tmnxPowerShelfOutputStatusUp

Table 2. Facility alarm name/raising log event, cause, effect and recovery
Facility alarm Facility alarm name/raising log event Cause Effect Recovery

295-2430-1

tmnxPowerSupplyFanFailed

The tmnxPowerSupplyFanFailed notification is generated when a fan within a particular power-supply has ceased to function normally.

Cooling to the power-supply may be reduced, potentially leading to overheating.

The power-supply should be replaced by one with fully-functioning fan elements.

59-2004-1

linkDown

A linkDown trap signifies that the SNMP entity, acting in an agent role, has detected that the ifOperStatus object for one of its communication links is about to enter the down state from some other state (but not from the notPresent state).

The indicated interface is taken down.

If the ifAdminStatus is down then the interface state is deliberate and there is no recovery.

If the ifAdminStatus is up then try to determine that cause of the interface going down: cable cut, distal end went down, and so on.

64-2091-1

tmnxSysLicenseInvalid

Generated when the license becomes invalid for the reason specified in the log event/alarm.

The system reboots at the end of the time remaining.

Configure a valid license file location and filename.

64-2092-1

tmnxSysLicenseExpiresSoon

Generated when the license expires soon.

The system reboots at the end of the time remaining.

Configure a valid license file location and filename.

64-2221-1

tmnxSysStandbyLicensingError

Generated when the standby detects a licensing failure. The reason is specified in tmnxSysLicenseErrorReason.

The standby CPM may not synchronized and may be put into a failed state.

Configure a valid license file location and filename, given the value of tmnxSysLicenseErrorReason.

93-2006-1

tmnxSatSyncIfTimHoldover

The tmnxSatSyncIfTimHoldover notification is generated when the synchronous equipment timing subsystem of the satellite transitions into a holdover state

The transmit timing of all synchronous interfaces on the satellite are no longer synchronous with the host. This could result in traffic loss.

Investigate the state of the two input timing references on the satellite and the links between the host and the satellite (the uplinks that drive them for failures).

93-2008-1

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)'

The tmnxSatSyncIfTimRef1Alarm notification is generated when an alarm condition on the first timing reference is detected.

If the other timing reference is free of faults, the satellite no longer has a backup timing reference. If the other timing reference also has a fault, the satellite is likely no longer synchronous with the host.

Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults.

93-2008-2

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)'

The same cause as 93-2008-1

The same effect as 93-2008-1

Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults.

93-2008-3

tmnxSatSyncIfTimRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)'

The same cause as 93-2008-1

The same effect as 93-2008-1

Investigate the state of the link between the host and the satellite (the uplink) that drives the first timing reference on the satellite for faults.

93-2010-x

same as 93-2008-x but for ref2

The same cause as 93-2008-x but for ref2

The same as 93-2008-x but for ref2

The same as 93-2008-x but for ref2

7-2001-1

tmnxEqCardFailure

Generated when one of the cards in a chassis has failed. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on. The reason is indicated in the details of the log event or alarm, and also available in the tmnxChassisNotifyCardFailureReason attribute included in the SNMP notification.

The effect is dependent on the card that has failed. IOM (or XCM) or MDA (or XMA) failure causes a loss of service for all services running on that card. A fabric failure can impact traffic to and from all cards.

7750 SR, 7450 ESS — If the IOM/IMM fails then the two associated MDAs for the slot also go down.

7950 XRS — If one out of two XMAs fails in a XCM slot then the XCM remains up. If only one remaining operational XMA within a XCM slot fails, then the XCM goes into a booting operational state.

Before taking any recovery steps collect a tech-support file, then try resetting (clear) the card. If unsuccessful, try removing and re-inserting the card. If that does not work then replace the card.

7-2003-1

tmnxEqCardRemoved

Generated when a card is removed from the chassis. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on.

The effect is dependent on the card that has been removed. IOM (or XCM) or MDA (or XMA) removal causes a loss of service for all services running on that card. A fabric removal can impact traffic to and from all cards.

Before taking any recovery steps collect a tech-support file, then try re-inserting the card. If unsuccessful, replace the card.

7-2004-1

tmnxEqWrongCard

Generated when the wrong type of card is inserted into a slot of the chassis. Even though a card may be physically supported by the slot, it may have been administratively configured to allow only specific card types in a particular slot location. The card type may be IOM (or XCM), MDA (or XMA), SFM, CCM, CPM, Compact Flash, and so on.

The effect is dependent on the card that has been incorrectly inserted. Incorrect IOM (or XCM) or MDA (or XMA) insertion causes a loss of service for all services running on that card.

Insert the correct card into the correct slot, and ensure the slot is configured for the correct type of card.

7-2005-1

tmnxEnvTempTooHigh

Generated when the temperature sensor reading on an equipment object is greater than its configured threshold.

This could be causing intermittent errors and could also cause permanent damage to components.

Remove or power off the affected cards, or improve the cooling to the node. More powerful fan trays may also be required.

7-2011-1

tmnxEqPowerSupplyRemoved

Generated when:

  • one of the power supplies is removed from the chassis

  • low input voltage is detected. The operating voltage range for the 7750 SR-7/12 and the 7450 ESS-7/12 is -40 to -72 VDC. The alarm is raised if the system detects that the voltage of the power supply has dropped to -42.5 VDC.

Reduced power can cause intermittent errors and could also cause permanent damage to components.

Re-insert the power supply or raise the input voltage to above -42.5 VDC.

7-2017-1

tmnxEqSyncIfTimingHoldover

Generated when the synchronous equipment timing subsystem transitions into a holdover state.

Any node-timed ports have very slow frequency drift limited by the central clock oscillator stability. The oscillator meets the holdover requirements of a Stratum 3 and G.813 Option 1 clock.

Address issues with the central clock input references.

7-2019-1

tmnxEqSyncIfTimingRef1Alarm

with attribute tmnxSyncIfTimingNotifyAlarm == 'los(1)'

Generated when an alarm condition on the first timing reference is detected. The type of alarm (los, oof, and so on) is indicated in the details of the log event or alarm, and is also available in the tmnxSyncIfTimingNotifyAlarm attribute included in the SNMP notification. The SNMP notification has the same indexes as those of the tmnxCpmCardTable.

Timing reference 1 cannot be used as a source of timing into the central clock.

Address issues with the signal associated with timing reference 1.

7-2019-2

tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oof(2)'

The same cause as 7-2019-1

The same effect as 7-2019-1

Address issues with the signal associated with timing reference 1.

7-2019-3

tmnxEqSyncIfTimingRef1Alarm with attribute tmnxSyncIfTimingNotifyAlarm == 'oopir(3)'

The same cause as 7-2019-1

The same effect as 7-2019-1

Address issues with the signal associated with timing reference 1.

7-2021-x

same as 7-2019-x but for ref2

The same cause as 7-2019-x but for the second timing reference

The same as 7-2019-x but for the second timing reference

The same as 7-2019-x but for the second timing reference

7-2030-x

same as 7-2019-x but for the BITS input

The same cause as 7-2019-x but for the BITS timing reference

The same as 7-2019-x but for the BITS timing reference

The same as 7-2019-x but for the BITS timing reference

7-2033-1

tmnxChassisUpgradeInProgress

The tmnxChassisUp gradeInProgress notification is generated only after a CPM switchover occurs and the new active CPM is running new software, while the IOMs or XCMs are still running old software. This is the start of the upgrade process. The tmnxChassisUpgradeInProgress notification continues to be generated every 30 minutes while at least one IOM or XCM is still running older software.

A software mismatch between the CPM and IOM or XCM is generally fine for a short duration (during an upgrade) but may not allow for correct long term operation.

Complete the upgrade of all IOMs or XCMs.

7-2073-x

same as 7-2019-x but for the BITS2 input

The same as 7-2019-x but for the BITS 2 timing reference

The same as 7-2019-x but for the BITS 2 timing reference

The same as 7-2019-x but for the BITS 2 timing reference

7-2092-1

tmnxEqPowerCapacityExceeded

Generated when a device needs power to boot, but there is not enough power capacity to support the device.

A non-powered device does not boot until the power capacity is increased to support the device.

Add a new power supply to the system, or change the faulty power supply with a working one.

7-2094-1

tmnxEqPowerLostCapacity

Generated when a power supply fails or is removed which puts the system in an overloaded situation.

Devices are powered off in order of lowest power priority until the available power capacity can support the powered devices.

Add a new power supply to the system, or change the faulty power supply with a working one.

7-2096-1

tmnxEqPowerOverloadState

Generated when the overloaded power capacity cannot support the power requirements and there are no further devices that can be powered off.

The system runs a risk of experiencing brownouts while the available power capacity does not meet the required power consumption.

Add power capacity or manually shutdown devices until the power capacity meets the power needs.

7-2104-1

tmnxEqLowSwitchFabricCap

The tmnxEqLowSwitchFabricCap alarm is generated when the total switch fabric capacity becomes less than the IOM capacity because of link failures. At least one of the taps on the IOM is below 100% capacity.

There is diminished switch fabric capacity to forward service-impacting information.

If the system does not self-recover, the IOM must be rebooted.

7-2134-1

tmnxSyncIfTimBITS2048khzUnsup

The tmnxSyncIfTimBITS2048khzUnsup notification is generated when the value of tSyncIfTimingAdmBITSIfType is set to 'g703-2048khz (5)' and the CPM does not meet the specifications for the 2048kHz BITS output signal under G.703.

The BITS input is not used as the Sync reference and the 2048kHz BITS output signal generated by the CPM is squelched.

Replace the CPM with one that is capable of generating the 2048kHz BITS output signal, or set tSyncIfTimingAdmBITSIfType to a value other than 'g703-2048khz (5)'.

7-2136-1

tmnxEqMgmtEthRedStandbyRaise

The tmnxEqMgmtEthRedStandbyRaise notification is generated when the active CPM's management Ethernet port goes operationally down and the standby CPM's management Ethernet port is operationally up and now serving as the system's management Ethernet port.

The management Ethernet port is no longer redundant. The node can be managed via the standby CPM's management Ethernet port only.

Bring the active CPM's management Ethernet port operationally up.

7-2138-1

tmnxEqPhysChassPowerSupOvrTmp

Generated when the temperature sensor reading on a power supply module is greater than its configured threshold.

This could be causing intermittent errors and could also cause permanent damage to components.

Remove or power off the affected power supply module or improve the cooling to the node. More powerful fan trays may also be required. The power supply itself may be faulty so replacement may be necessary.

7-2140-1

tmnxEqPhysChassPowerSupAcFail

Generated when an AC failure is detected on a power supply.

Reduced power can cause intermittent errors and could also cause permanent damage to components.

First try re-inserting the power supply. If unsuccessful, replace the power supply.

7-2142-1

tmnxEqPhysChassPowerSupDcFail

Generated when an DC failure is detected on a power supply.

Reduced power can cause intermittent errors and could also cause permanent damage to components.

First try re-inserting the power supply. If unsuccessful, then replace the power supply.

7-2144-1

tmnxEqPhysChassPowerSupInFail

Generated when an input failure is detected on a power supply.

Reduced power can cause intermittent errors and could also cause permanent damage to components.

First try re-inserting the power supply. If that does not work, then replace the power supply.

7-2146-1

tmnxEqPhysChassPowerSupOutFail,

Generated when an output failure is detected on a power supply.

Reduced power can cause intermittent errors and could also cause permanent damage to components.

First try re-inserting the power supply. If that does not work, then replace the power supply.

7-2148-1

tmnxEqPhysChassisFanFailure

Generated when one of the fans in a fan tray has failed.

This could cause the temperature to rise and result in intermittent errors and potentially permanent damage to components.

Replace the fan tray immediately, improve the cooling to the node, or reduce the heat being generated in the node by removing cards or powering down the node.

7-2153-1

tmnxCpmMemSizeMismatch

A tmnxCpmMemSizeMismatch notification is generated when the RAM memory size of the standby CPM (that is, tmnxChassisNotifyCpmCardSlotNum) is different from the active CPM (that is, tmnxChassisNotifyHwIndex).

There is an increased risk of the memory overflow on the standby CPM during the CPM switchover.

Use CPMs with the same memory size.

7-2156-1

tmnxPhysChassPwrSupWrgFanDir

The tmnxPhysChassPwrSupWrgFanDirClr notification is generated when the airflow direction of the power supply's fan is corrected.

The fan is cooling the power supply in the correct direction.

No recovery required.

7-2157-1

tmnxPhysChassPwrSupPemACRect

The tmnxPhysChassPwrSupPemACRect notification is generated if any one of the AC rectifiers for a power supply is in a failed state or is missing.

There is an increased risk of the power supply failing, causing insufficient power to the system.

Bring the AC rectifiers back online.

7-2159-1

tmnxPhysChassPwrSupInputFeed

The tmnxPhysChassPwrSupInputFeed notification is generated if any one of the input feeds for a power supply is not supplying power.

There is an increased risk of system power brown-outs or black-outs.

Restore all of the input feeds that are not supplying power.

7-2161-1

tmnxEqBpEpromFail

The tmnxEqBpEpromFail alarm is generated when the active CPM is no longer able to access any of backplane EPROMs because of a hardware defect.

The active CPM is at risk of failing to initialize after node reboot because of not being able to access the BP EPROM to read the chassis type.

The system does not self-recover and Nokia Support has to be contacted for further instructions.

7-2163-1

tmnxEqBpEpromWarning

The tmnxEqBpEpromWarning alarm is generated when the active CPM is no longer to access one backplane EPROM because of a hardware defect but a redundant EPROM is present and accessible.

There is no effect on system operation.

No recovery action required.

7-2165-1

tmnxPhysChassisPCMInputFeed

The tmnxPhysChassisPCMInputFeed notification is generated if any one of the input feeds for a PCM has gone offline.

There is an increased risk of system power brown-outs or black-outs.

Restore all of the input feeds that are not supplying power.

7-2190-1

tmnxPhysChassisPMOutFail

The tmnxPhysChassisPMOutFail notification is generated when an output failure occurs on the power module.

The power module is no longer operational.

Insert a new power module.

7-2192-1

tmnxPhysChassisPMInputFeed

The tmnxPhysChassisPMInputFeed notification is generated if any one of the input feeds for a power module is not supplying power.

There is an increased risk of system power brownouts or blackouts.

Restore all of the input feeds that are not supplying power.

7-2194-1

tmnxPhysChassisFilterDoorOpen

The tmnxPhysChassisFilterDoorOpen notification is generated when the filter door is either open or not present.

Power shelf protection may be compromised.

If the filter door is not installed, install it. Close the filter door.

7-2196-1

tmnxPhysChassisPMOverTemp

The tmnxPhysChassisPMOverTemp notification is generated when a power module's temperature surpasses the temperature threshold.

The power module is no longer operational.

Check input feed or insert a new power module.

7-2203-x

same as 7-2019-x but for SyncE

The same cause as 7-2019-x but for SyncE

same as 7-2019-x but for SyncE

same as 7-2019-x but for SyncE

7-2205-x

same as 7-2019-x but for E2

The same cause as 7-2019-x but for E2

same as 7-2019-x but for E2

same as 7-2019-x but for E2

7-4001-1

tmnxInterChassisCommsDown

The tmnxInterChassis CommsDown alarm is generated when the active CPM cannot reach the far-end chassis.

The resources on the far-end chassis are not available. This event for the far-end chassis means that the CPM, SFM, and XCM cards in the far-end chassis reboot and remain operationally down until communications are re-established.

Ensure that all CPM interconnect ports in the system are properly cabled together with working cables.

7-4003-1

tmnxCpmIcPortDown

The tmnxCpmIcPort Down alarm is generated when the CPM interconnect port is not operational. The reason may be a cable connected incorrectly, a disconnected cable, a faulty cable, or a misbehaving CPM interconnect port or card.

At least one of the control plane paths used for inter-chassis CPM communication is not operational. Other paths may be available.

A manual verification and testing of each CPM interconnect port is required to ensure fully functional operation. Physical replacement of cabling may be required.

7-4006-1

tmnxCpmIcPortSFFRemoved

The tmnxCpmIcPortSFFRemoved notification is generated when the SFF (eg. QSFP) is removed from the CPM interconnect port. Removing an SFF causes both this trap, and also a tmnxCpmIcPortDown event.

Removing the SFF causes the CPM interconnect port to go down. This port is no longer able to be used as part of the control plane between chassis but other paths may be available.

Insert a working SFF into the port.

7-4007-1

tmnxCpmNoLocalIcPort

The tmnxCpmNoLo calIcPort alarm is generated when the CPM cannot reach the other chassis using its local CPM interconnect ports.

Another control communications path may still be available between the CPM and the other chassis via the mate CPM in the same chassis. If that alternative path is not available then complete disruption of control communications to the other chassis occurs and the tmnxInterChassisCommsDown alarm is raised.

A tmnxCpmNoLocalIcPort alarm on the active CPM indicates that a further failure of the local CPM interconnect ports on the standby CPM causes complete disruption of control communications to the other chassis and the tmnxInterChassisCommsDown alarm is raised.

A tmnxCpmNoLocalIcPort alarm on the standby CPM indicates that a CPM switchover may cause temporary disruption of control communications to the other chassis while the rebooting CPM comes back into service.

Ensure that all CPM interconnect ports in the system are properly cabled together with working cables.

7-4017-1

tmnxSfmIcPortDown

The tmnxSfmIcPortDown alarm is generated when the SFM interconnect port is not operational. The reason may be a cable connected incorrectly, a disconnected cable, a faulty cable, or a misbehaving SFM interconnect port or SFM card.

This port can no longer be used as part of the user plane fabric between chassis. Other fabric paths may be available resulting in no loss of capacity.

A manual verification and testing of each SFM interconnect port is required to ensure fully functional operation. Physical replacement of cabling may be required.

7-6002-1

tmnxPowerShelfCommsDown

The tmnxPowerShelfCommsDown is generated when there is a loss of communications with the power shelf controller.

If there is a power failure, it is not detected because the power modules cannot be polled. The system continues to report the state of the power modules as they were when last seen.

Correct the power shelf controller communications problem.

7-6005-1

tmnxPowerShelfOutputStatusDown

The tmnxPowerShelfOutputStatusSwitch is generated when the physical output switch on the power shelf is set to Standby.

The power output from the identified power shelf is switched off and does not supply power to the system.

Set output switch to On to restore power output.

ESA facility alarm, facility alarm name, raising log event, sample details string, and clearing log event and ESA facility alarm name/raising log event, cause, effect, and recovery show the supported ESA facility alarms.

Table 3. ESA facility alarm, facility alarm name, raising log event, sample details string, and clearing log event
ESA Event and states Facility Alarm Severity Facility alarm name/raising log event Sample details string Clearing log event
ESA HW Status: degraded 2400-1 Major tmnxEsaHwStatusDegraded ESA 3 aggregate hardware status degraded tmnxEsaHwStatusDegradedClr
ESA HW Status: critical 2402-1 Critical tmnxEsaHwStatusCritical ESA 3 aggregate hardware status critical tmnxEsaHwStatusCriticalClr
ESA HW Power Supply 1 Degraded 2404-1 Critical tmnxEsaHwPwrSup1Degraded ESA 3 power supply 1 status degraded tmnxEsaHwPwrSup1DegradedClr
ESA HW Power Supply 1 Failed 2406-1 Critical tmnxEsaHwPwrSup1Failed ESA 3 power supply 1 status failed tmnxEsaHwPwrSup1FailedClr
ESA HW Power Supply 2 Degraded 2408-1 Critical tmnxEsaHwPwrSup2Degraded ESA 3 power supply 2 status degraded tmnxEsaHwPwrSup2DegradedClr
ESA HW Power Supply 2 Failed 2410-1 Critical tmnxEsaHwPwrSup2Failed ESA 3 power supply 2 status failed tmnxEsaHwPwrSup2FailedClr
ESA HW Fan Bank Non Redundant 2412-1 Major tmnxEsaHwFanBankNonRedun ESA 3 fan bank redundancy degraded tmnxEsaHwFanBankNonRedunClr
ESA HW Fan Bank Failed Redundancy 2414-1 Critical tmnxEsaHwFanBankFailRedun ESA 3 fan bank redundancy failed tmnxEsaHwFanBankFailRedunClr
ESA HW Fan Status Degraded 2416-1 Critical tmnxEsaHwFanStatusDegraded ESA 3 fan status degraded tmnxEsaHwFanStatusDegradedClr
ESA HW Fan Status Failed 2418-1 Critical tmnxEsaHwFanStatusFailed ESA 3 fan status failed tmnxEsaHwFanStatusFailedClr
ESA HW Power Supply Mismatch 2420-1 Major tmnxEsaHwPwrSupMismatch ESA 3 power supply mismatch tmnxEsaHwPwrSupMismatchClr
ESA HW Power Supply Bank Non Redundant 2422-1 Major tmnxEsaHwPwrSupBankNonRedun ESA 3 power supply bank redundancy degraded tmnxEsaHwPwrSupBankNonRedunClr
ESA HW Temperature Degraded 2426-1 Critical tmnxEsaHwTemperatureDegraded ESA 3 temperature status degraded tmnxEsaHwTemperatureDegradedClr
ESA HW Temperature Failed 2428-1 Critical tmnxEsaHwTemperatureFailed ESA 3 temperature status failed tmnxEsaHwTemperatureFailedClr
Table 4. ESA facility alarm name/raising log event, cause, effect, and recovery
Facility Alarm Facility alarm name/raising log event Cause Effect Recovery
2400-1 tmnxEsaHwStatusDegraded Generated when the ESA hardware status is degraded Service may be affected Contact Nokia customer support
2402-1 tmnxEsaHwStatusCritical Generated when one or more ESA hardware statuses are critical Service may be affected Contact Nokia customer support
2404-1 tmnxEsaHwPwrSup1Degraded Generated when the ESA power supply 1 is degraded Power supply and redundancy are affected. ESA operation may be affected. Contact Nokia customer support
2406-1 tmnxEsaHwPwrSup1Failed Generated when the ESA power supply 1 fails Power supply and redundancy are affected. ESA operation may be affected. Contact Nokia customer support
2408-1 tmnxEsaHwPwrSup2Degraded Generated when the ESA power supply 2 is degraded Power supply operation and reliability may be affected Contact Nokia customer support
2410-1 tmnxEsaHwPwrSup2Failed Generated when the ESA power supply 2 fails Power supply and redundancy are affected. ESA operation may be affected. Contact Nokia customer support
2412-1 tmnxEsaHwFanBankNonRedun Generated when 1 to 6 of the 7 fans are failed ESA cooling may be inadequate Contact Nokia customer support
2414-1 tmnxEsaHwFanBankFailRedun Generated when all 7 fans are failed ESA cooling is inadequate, the ESA may shut down Contact Nokia customer support
2416-1 tmnxEsaHwFanStatusDegraded Generated when one or more ESA fans are degraded ESA cooling may be inadequate Contact Nokia customer support
2418-1 tmnxEsaHwFanStatusFailed Generated when one or more ESA fans fail ESA cooling may be inadequate Contact Nokia customer support
2420-1 tmnxEsaHwPwrSupMismatch Generated when the ESA power supplies do not match ESA power supplies must be matched Equip the ESA with matching power supplies
2422-1 tmnxEsaHwPwrSupBankNonRedun Generated when one ESA power supply has failed and the other is running ESA will fail if another power supply failure occurs Contact Nokia customer support
2426-1 tmnxEsaHwTemperatureDegraded Generated when one or more ESA temperatures is outside the expected operating range If the ESA temperature remains outside the expected operating range, the ESA may shut down The ESA may need maintenance to rectify the issue. Contact Nokia customer support.
2428-1 tmnxEsaHwTemperatureFailed Generated when the ESA temperature is critical If the ESA temperature remains outside the expected operating range, the ESA may shut down The ESA may need maintenance to rectify the issue. Contact Nokia customer support.

The linkDown Facility Alarm is supported for the objects listed in linkDown Facility Alarm support (note that all objects may not be supported on all platforms):

Table 5. linkDown Facility Alarm support
Object Supported

Ethernet Ports

Yes

Sonet Section, Line and Path (POS)

Yes

TDM Ports (E1, T1, DS3) including CES MDAs

Yes

TDM Channels (DS3 channel configured in an STM-1 port)

Yes

Ethernet LAGs

No

APS groups

No

Ethernet VLANs

No

Configuring facility alarms with CLI

This section provides information to configure facility alarms using the command line interface.

Enabling facility alarms

The following example shows how to enable facility alarms.

MD-CLI

[ex:/configure system alarms]
A:admin@node-2# info
    admin-state enable 

classic CLI

A:node-2>config>system# alarms
#------------------------------------------
        no shutdown
        exit
---------------------------------------------- 

Common configuration tasks

Configuring the maximum number of alarms to clear

You can configure the number of entries to keep in the list of cleared alarms. Use the following command to configure the maximum number of cleared alarms to keep.

configure system alarms max-cleared
MD-CLI
[ex:/configure system alarms]
A:admin@node-2# info
    max-cleared 100 
classic CLI
*A:node-2>config>system>alarms# max-cleared 100
A:node-2>config>system>alarms# info
----------------------------------------------
            max-cleared 100
----------------------------------------------