The troubleshooting process

Identifying network performance issues

The troubleshooting process identifies and resolves performance issues related to a network service or component. The performance issue can result in service degradation, or in a complete network failure.

The first step in problem resolution is to identify the problem. Problem identification can include an alarm received from a network component, an analysis of network capacity and performance data, or a customer problem report.

The personnel responsible for troubleshooting the problem must:

understand the designed state and behavior of the network, and the services that use the network
recognize and identify symptoms that impact the intended function and performance of the product

Network maintenance

The most effective method to prevent problems is to schedule and perform routine maintenance on your network. Major networking problems often start as minor performance issues. See the NSP System Administrator Guide for more information about how to perform routine maintenance on your network.

Troubleshooting problem-solving model

An effective troubleshooting problem-solving model includes the following tasks:

Establish a performance baseline .
Categorize the problem .
Identify the root cause of the problem .
Plan corrective action and resolve the problem .
Verify the solution to the problem .

See Process to troubleshoot a problem in the NSP for information about how the problem-solving model aligns with using the NSP to troubleshoot a network or network management problem.

Establish a performance baseline

You must have a thorough knowledge of your network and how it operates under normal conditions to troubleshoot problems effectively. This knowledge facilitates the identification of fault conditions in your network. You must establish and maintain baseline information for your network and services. The maintenance of the baseline information is critical because a network is not a static environment.

See the NSP System Administrator Guide for more information on how to generate NSP system baseline information.

Categorize the problem

When you categorize a problem, you must differentiate between total failures and problems that result in a degradation in performance. For example, the failure of an access switch results in a total failure for a customer who has one DS3 link into a network. A core router that operates at over 80% average utilization can start to discard packets, which results in a degradation of performance for services that use the device. Performance degradations exhibit different symptoms from total failures and may not generate alarms or significant network events.

Multiple problems can simultaneously occur and create related or unique symptoms. Detailed information about the symptoms that are associated with the problem helps the NOC or engineering operational staff diagnose and fix the problem. The following information can help you assess the scope of the problem:

alarm files
error logs
network statistics
network analyzer traces

output of CLI show commands
accounting logs
customer problem reports

Use the following guidelines to help you categorize the problem:

Is the problem intermittent or static?
Is there a pattern associated with intermittent problems?
Is there an alarm or network event that is associated with the problem?
Is there congestion in the routers or network links?
Has there been a change in the network since proper function?

Identify the root cause of the problem

A symptom for a problem can be the result of more than one network issue. You can resolve multiple, related problems by resolving the root cause of the problem.

Use the following guidelines to help you implement a systematic approach to resolve the root cause of the problem:

Identify common symptoms across different areas of the network.
Focus on the resolution of a specific problem.
Divide the problem based on network segments and try to isolate the problem to one of the segments.
Examples of network segments are:
- LAN switching (edge access)
- LAN routing (distribution, core)
- metropolitan area
- WAN (national backbone)
- partner services (extranet)
- remote access services
Determine the network state before the problem appeared.
Extrapolate from network alarms and network events the cause of the symptoms. Try to reproduce the problem.

Plan corrective action and resolve the problem

The corrective action required to resolve a problem depends on the problem type. The problem severity and associated QoS commitments affect the approach to resolving the problem. You must balance the risk of creating further service interruptions against restoring service in the shortest possible time.

Corrective action should:

Document each step of the corrective action.
Test the corrective action.
Use the CLI to verify behavior changes in each step.
Apply the corrective action to the live network.
Test to verify that the corrective action resolved the problem.

Verify the solution to the problem

You must make sure that the corrective action associated with the resolution of the problem did not introduce new symptoms in your network. If new symptoms are detected, or if the problem has only recently been mitigated, you need to repeat the troubleshooting process.

Checklist for identifying problems

When a problem is identified in the network management domain, track and store data to use for troubleshooting purposes:

Determine the type of problem.
Review the sequence of events before the problem occurred:
- Trace the actions that were performed to see where the problem occurred.
- Identify what changed before the problem occurred.
- Determine whether the problem happened before under similar conditions.
Check the documentation or your procedural information to verify that the steps you performed followed documented standards and procedures.
Check the alarm log for any generated alarms that are related to the problem.
Record any system-generated messages, such as error dialog boxes, for future troubleshooting.
If you receive an error message, perform the actions recommended in the error dialog box, client GUI dialog box, SOAP exception response, or event notification.

During troubleshooting:

Keep both the Nokia documentation and your company policies and procedures nearby.
Check the appropriate release notice from the Nokia Support Documentation Service for any release-specific problems, restrictions, or usage recommendations that relate to your problem.
If you need help, confirmation, or advice, contact your TAC or technical support representative. See Table 1-1, General NSP problem types to collect the appropriate information before you call support.
Contact your TAC or technical support representative if your company guidelines conflict with Nokia documentation recommendations or procedures.
Perform troubleshooting based on your network requirements.