The troubleshooting process
Identifying network performance issues
The troubleshooting process identifies and resolves performance issues related to a network service or component. The performance issue can result in service degradation, or in a complete network failure.
The first step in problem resolution is to identify the problem. Problem identification can include an alarm received from a network component, an analysis of network capacity and performance data, or a customer problem report.
The personnel responsible for troubleshooting the problem must:
-
understand the designed state and behavior of the network, and the services that use the network
-
recognize and identify symptoms that impact the intended function and performance of the product
Network maintenance
The most effective method to prevent problems is to schedule and perform routine maintenance on your network. Major networking problems often start as minor performance issues. See the NSP System Administrator Guide for more information about how to perform routine maintenance on your network.
Troubleshooting problem-solving model
An effective troubleshooting problem-solving model includes the following tasks:
See Process to troubleshoot a problem in the NSP for information about how the problem-solving model aligns with using the NSP to troubleshoot a network or network management problem.
Establish a performance baseline
You must have a thorough knowledge of your network and how it operates under normal conditions to troubleshoot problems effectively. This knowledge facilitates the identification of fault conditions in your network. You must establish and maintain baseline information for your network and services. The maintenance of the baseline information is critical because a network is not a static environment.
See the NSP System Administrator Guide for more information on how to generate NSP system baseline information.
Categorize the problem
When you categorize a problem, you must differentiate between total failures and problems that result in a degradation in performance. For example, the failure of an access switch results in a total failure for a customer who has one DS3 link into a network. A core router that operates at over 80% average utilization can start to discard packets, which results in a degradation of performance for services that use the device. Performance degradations exhibit different symptoms from total failures and may not generate alarms or significant network events.
Multiple problems can simultaneously occur and create related or unique symptoms. Detailed information about the symptoms that are associated with the problem helps the NOC or engineering operational staff diagnose and fix the problem. The following information can help you assess the scope of the problem:
Use the following guidelines to help you categorize the problem:
-
Is there an alarm or network event that is associated with the problem?
-
Has there been a change in the network since proper function?
Identify the root cause of the problem
A symptom for a problem can be the result of more than one network issue. You can resolve multiple, related problems by resolving the root cause of the problem.
Use the following guidelines to help you implement a systematic approach to resolve the root cause of the problem:
-
Identify common symptoms across different areas of the network.
-
Divide the problem based on network segments and try to isolate the problem to one of the segments.
Examples of network segments are: -
Extrapolate from network alarms and network events the cause of the symptoms. Try to reproduce the problem.
Plan corrective action and resolve the problem
The corrective action required to resolve a problem depends on the problem type. The problem severity and associated QoS commitments affect the approach to resolving the problem. You must balance the risk of creating further service interruptions against restoring service in the shortest possible time.
Corrective action should:
Verify the solution to the problem
You must make sure that the corrective action associated with the resolution of the problem did not introduce new symptoms in your network. If new symptoms are detected, or if the problem has only recently been mitigated, you need to repeat the troubleshooting process.
Checklist for identifying problems
When a problem is identified in the network management domain, track and store data to use for troubleshooting purposes:
-
Determine the type of problem.
Review the sequence of events before the problem occurred: -
Check the documentation or your procedural information to verify that the steps you performed followed documented standards and procedures.
-
Check the alarm log for any generated alarms that are related to the problem.
-
Record any system-generated messages, such as error dialog boxes, for future troubleshooting.
-
If you receive an error message, perform the actions recommended in the error dialog box, client GUI dialog box, SOAP exception response, or event notification.
During troubleshooting:
-
Keep both the Nokia documentation and your company policies and procedures nearby.
-
Check the appropriate release notice from the Nokia Support Documentation Service for any release-specific problems, restrictions, or usage recommendations that relate to your problem.
-
If you need help, confirmation, or advice, contact your TAC or technical support representative. See Table 1-1, General NSP problem types to collect the appropriate information before you call support.
-
Contact your TAC or technical support representative if your company guidelines conflict with Nokia documentation recommendations or procedures.