Fault management
Overview
The fault management system that the CPAM provides includes:
-
RCA audit policy for identifying problems or errors in protocol-related and control plane configuration
RCA audit policies
The CPAM allows you to perform on-demand verifications of the IP/MPLS configuration of IS-IS and OSPF routing protocols on NFM-P-managed NEs. RCA audit policies identify problems or errors in protocol-related and control plane configurations.
RCA audit polices can be created for IGP administrative domains to verify the configuration of network objects on NFM-P-managed routers. For each audit policy created, one or more policy entries can be specified to define the scope of configuration that is verified by the audit and the severity of the related problem.
See “RCA audit policies” in the NSP NFM-P Control Plane Assurance Manager User Guide for more information about RCA audit policies.
RCA audit policies workflow
The following workflow provides how to create and use RCA audit policies.
1 |
Create an RCA audit policy. Configure the attributes that are verified for each RCA audit policy entry. |
2 |
Run the RCA audit policy on a specific object. |
3 |
Identify the problems and correct the configuration. |
RCA audit policies creation
The rca package is used for policy-driven IP/MPLS configuration audits for OSPF and IS-IS routing protocols.
When creating an RCA audit policy without specifying any audit policy entry, by default, all entries are created and enabled for an RCA audit. If you want to only audit specific attributes, you would need to create audit policy entries for those attributes.
The following example shows the minimum XML parameter tags required to develop an XML request to create an OSPF RCA audit policy with an audit policy entry to audit only the MTU attribute of OSPF interfaces.
The generic.GenericObject.configureChildInstance method is used to create an RCA audit policy. The request response returns the <objectFullName> attribute to identify the new RCA audit policy.
RCA audit policy creation parameters
-
distinguishedName—rca-mgr. The FDN can be determined by looking at the Parent Hierarchy of the class to be created in the XML API Reference. In this case, it is the rca.RcaManager class.
-
childConfigInfo—creates and configures the following parameters:
-
<id> - (optional) assigns a unique numeric ID. If not specified, the NFM-P automatically assigns the ID.
-
<policyName> - ospf for OSPF type RCA audit policy or isis for IS-IS type RCA audit policy
-
<children-Set>—tag to enclose children objects
• RCA audit policy entry object - rca.AuditPolicyEntry
• actionMask - specifies the modify operation
• <objectFullName> - rca-mgr:ID-${*id}:enId-${*entryId} - FDN of the RCA audit policy entry object
• <mismatchAttributes>—set of attributes to configure to enable or disable audit for and the severity of the related problem.
• Mismatch attribute set - rca.MismatchAttribute
• attributeName - OSPF interface attributes. See ospf.Interface class in the XML API Reference.
• severity - sets severity, such as info, minor, major, critical, etc. See the XML API Reference for values.
• enabled - true or false
Figure 21-21: OSPF RCA audit policy creation request example
<generic.GenericObject.configureChildInstance xmlns="xmlapi_1.0"> |
<deployer>immediate</deployer> |
<distinguishedName>rca-mgr</distinguishedName> |
<childConfigInfo> |
<rca.AuditPolicy> |
<actionMask> |
<bit>create</bit> |
</actionMask> |
<id>100</id> |
<policyName>ospf</policyName> |
<children-Set> |
<rca.AuditPolicyEntry> |
<actionMask> |
<bit>modify</bit> |
</actionMask> |
<objectFullName>rca-mgr:ID-100:enId-1</objectFullName> |
<mismatchAttributes> |
<rca.MismatchAttribute> |
<attributeName>mtu</attributeName> |
<severity>critical</severity> |
<enabled>true</enabled> |
</rca.MismatchAttribute> |
</mismatchAttributes> |
</rca.AuditPolicyEntry> |
</children-Set> |
</rca.AuditPolicy> |
</childConfigInfo> |
</generic.GenericObject.configureChildInstance> |
RCA audit policies configuration check method
RCA audits can be manually triggered by using the rca.RcaManager.checkConfig method. The method validates the configuration of OSPF or IS-IS routing protocols set by the RCA audit policy.
Check config method
-
<fdn>—tpgy-mgr:name-${displayedName}-AS-${asNumber}:protocol-${protocol}-area-${areaId}, which is a pointer to the OSPF area to perform the check configuration for. The FDN can be determined by looking at the Parent Hierarchy of the class to be created in the XML API Reference. In this case, it is the topology.Area class.
-
<fdnPolicy>—rca-mgr:ID-${*Id}, which is a pointer to the RCA audit policy. In this case, it is the rca.AuditPolicy class.
-
<alnContext>—specifies the rca.Context. The value for this context is null for an OSPF or IS-IS RCA audit, but the tags need to be specified to make the request valid.
-
<resultFilter>—(optional) filter for reducing the scope of the returned information
Figure 21-22: RCA audit execution request example
<rca.RcaManager.checkConfig xmlns="xmlapi_1.0"> |
<deployer>immediate</deployer> |
<fdn>tpgy-mgr:name-AdminDomain1-AS-1:protocol-ospf-area-0.0.0.0</fdn> |
<fdnPolicy>rca-mgr:ID-100</fdnPolicy> |
<aInContext> |
<rca.Context/> |
</aInContext> |
<resultFilter/> |
</rca.RcaManager.checkConfig> |
If there are problems with the configuration, the response is similar to the example in the following figure.
Figure 21-23: RCA audit problem response example
<rca.RcaManager.checkConfigResponse xmlns="xmlapi_1.0"> |
<problemList> |
<rca.Problem> |
<lastTimeChanged>1344305122231</lastTimeChanged> |
<id>notFound</id> |
<causedByObjectFullNames> |
<pointer>network:198.51.100.32:router-1:ospf-v2:areaSite-0.0.0.0:interface-5</pointer> |
</causedByObjectFullNames> |
<isFixable>false</isFixable> |
<severity>indeterminate</severity> |
<probableCause>unknown</probableCause> |
<description>Far-End Not found</description> |
<ignoreFix>false</ignoreFix> |
<applicationEnum>unspecified</applicationEnum> |
<solution/> |
<policyFdn>rca-mgr:ID-100</policyFdn> |
<deploymentState>0</deploymentState> |
<objectFullName>network:198.51.100.32:router-1:ospf-v2:areaSite-0.0.0.0:interface-5:cause-unknown-notFound</objectFullName> |
<name>cause-unknown-notFound</name> |
<selfAlarmed>false</selfAlarmed> |
<children-Set/> |
</rca.Problem> |
<rca.Problem> |
<lastTimeChanged>1344305122232</lastTimeChanged> |
<id>aggregated</id> |
<causedByObjectFullNames> |
<pointer>ospf:area-0.0.0.0-version-2</pointer> |
</causedByObjectFullNames> |
<isFixable>false</isFixable> |
<severity>indeterminate</severity> |
<probableCause>aggregatedCause</probableCause> |
<description>Aggregated Cause</description> |
<ignoreFix>false</ignoreFix> |
<applicationEnum>unspecified</applicationEnum> |
<solution/> |
<policyFdn>rca-mgr:ID-100</policyFdn> |
<deploymentState>0</deploymentState> |
<objectFullName>ospf:area-0.0.0.0-version-2:cause-aggregatedCause-aggregated</objectFullName> |
<name>cause-aggregatedCause-aggregated</name> |
<selfAlarmed>false</selfAlarmed> |
<children-Set/> |
</rca.Problem> |
</problemList> |
</rca.RcaManager.checkConfigResponse> |
CPAM alarms
In addition to routing alarms generated by routers, there are threshold reaching alarms generated by the 7701 CPAA. The alarm generation process uses the routing data collected by the 7701 CPAA. The generated alarms are sent to the CPAM route controller. The alarms are then available to an OSS application in the same manner that alarms generated in the NFM-P are sent to the XML API clients.
See Chapter 11, Fault management for more information about retrieving alarms using the OSS interface.
All CPAM-related JMS event messages are part of the CPAM event category, for example, ALA_category = CPAM. When retrieving CPAM threshold alarms via JMS, set ALA_category=CPAM as part of the filter.
The following are not included in the CPAM category:
See “Threshold reaching alarms” in the NSP NFM-P Control Plane Assurance Manager User Guide for more information about the CPAM threshold reaching alarms for BGP, IS-IS, and OSPF.
The following table shows some of the CPAM alarms threshold package and class for setting alarm threshold attributes for the specific alarm. This list is not complete. See the XML API Reference under the topology package for a complete list.
Table 21-7: CPAM alarm threshold objects
Routing alarm type |
Package and class |
---|---|
ISIS LSP Alarm Threshold |
topology.IsisLspAlarmThreshold |
ISIS Reachability Threshold |
topology.IsisReachabilityAlarmThreshold |
OSPF LSA Rate Alarm Threshold |
topology.OspfLsaRatePerRouterAlarmThreshold |
OSPF LSA Alarm Threshold |
topology.OspfLsaPerRouterAlarmThreshold |
BGP Route Rate Threshold Per RT Threshold |
topology.BgpRouteRateThresholdPerRTargetAlarmThreshold |
BGP AS Path Length Per Monitored Prefix Threshold |
topology.BgpAsPathLenPerMonPrefAlarmThreshold |
BGP Packet Rate Threshold |
topology.BgpPktRateAlarmThreshold |
BGP Route Count Threshold |
topology.BgpRouteCountAlarmThreshold |
BGP Route Flap Rate Threshold |
topology.BgpRouteFlapRateAlarmThreshold |
BGP Route Rate Threshold |
topology.BgpRouteRateAlarmThreshold |
BGP Monitored Prefix Flap Rate Threshold Reached |
topology.BgpMonitorPrefixFlapRateThresholdReached |
BGP monitored Prefix Unreachable |
topology.BgpMonPrefixUnreachable |
BGP Prefix Monitor Redundancy Loss |
topology.BgpMonPrefRedundancyLossThresholdReached |
The following are the mandatory XML parameter tags required to develop an XML request to create and set a BGP alarm threshold.
The generic.GenericObject.configureChildInstance method is used to create a BGP route count alarm threshold. The request response returns the <objectFullName> attribute to identify an alarm threshold.
Alarm threshold creation parameters
-
distinguishedName—tpgy-rtr-alarm-mgr. The FDN can be determined by looking at the Parent Hierarchy of the class to be created in the XML API Reference. In this case, it is the topology.RoutingAlarmManager class.
-
childConfigInfo—creates and configures the following parameters:
-
BGP route count alarm threshold object - topology.BgpRouteCountAlarmThreshold
-
<cpaaPointer> - network:${systemAddress}:cpaa - pointer to the 7701 CPAA
-
<alarmName> - sets the alarm name for this threshold. In Figure 21-24, BGP route count alarm threshold creation request example , the alamr name is the BgpRouteCountThresholdReached alarm. See the XML API Reference under the fm.alarmName type for a complete list of alarm names. Search for package=topology for CPAM alarms.
-
<administrativeState> - enables or disables monitoring of alarm threshold
-
Figure 21-24: BGP route count alarm threshold creation request example
<generic.GenericObject.configureChildInstance xmlns="xmlapi_1.0"> |
<deployer>immediate</deployer> |
<distinguishedName>tpgy-rtr-alarm-mgr</distinguishedName> |
<childConfigInfo> |
<topology.BgpRouteCountAlarmThreshold> |
<actionMask> |
<bit>create</bit> |
</actionMask> |
<cpaaPointer>network:198.51.100.37:cpaa</cpaaPointer> |
<alarmName>topology.BgpRouteCountThresholdReached</alarmName> |
<threshold>1000</threshold> |
<administrativeState>up</administrativeState> |
</topology.BgpRouteCountAlarmThreshold> |
</childConfigInfo> |
</generic.GenericObject.configureChildInstance> |