What is NFM-P system maintenance?
Introduction
The implementation of a regular maintenance schedule is recommended in order to:
NFM-P system maintenance begins with the establishment of base measures against which to evaluate the system functionality and correct any performance or connectivity issues.
NFM-P OLC states
You can put NEs in maintenance mode using OLC states, as described in Setting NFM-P OLC states.
NFM-P base measures
Maintenance base measures can be used by NOC operations or engineering staff who are responsible for maintenance issues to evaluate the activity and performance of network components, for example, client GUI response times when listing equipment.
The data from a series of base measures can be used, over time, to track performance trends. For example, if there are reports that client GUI response times for listing equipment degrades over time, you can use the base measures to determine how much performance has degraded. The procedures in this guide can help narrow the search for the cause of performance degradation.
It is recommended to do the following:
-
Determine the types of base measures required for your network.
-
Regularly collect system information and compare the information with the base measure data.
This section provides base measure information for:
-
performance and scalability—to categorize system limitations as a baseline against NMS response times
-
inventory counts—to generate inventory lists for storage and post-processing
Establishing base measures
Base measures can be affected by issues that are beyond the scope of this guide, including:
The NFM-P service test manager (STM) provides the ability to group OAM diagnostic tests into test suites that you can run as scheduled tasks. You can customize a test suite to your network topology and execute the test suite to establish baseline performance information. You can retain the test suite, modify it to accommodate network topology changes, and execute the test suite to establish new base measures as required. Scheduled execution of the test suite and regular review of the results may reveal deviations from the baseline. See the NSP NFM-P User Guide for information about using the STM and creating scheduled tasks.
Platform base measures
You can use platform base measures to:
-
track network-specific growth to provide a delta for performance measures, for example, how long it takes to list 1000 ports on the current station compared to 10 000 ports on the same station, or on a smaller or larger station
Table 21-4: Platform base data
Component |
Platform information |
---|---|
Main server 1 |
RAM: CPU (quantity, type, speed): OS version and patch level: |
Main database 1 |
RAM: CPU (quantity, type, speed): OS version and patch level: |
Main server 2 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Disk slices: |
Main database 2 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Database disk file systems: Disk slice sizes: |
Auxiliary server 1 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Disk slice sizes: |
Auxiliary server 2 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Disk slice sizes: |
Auxiliary server 1 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Disk slice sizes: |
Auxiliary server 2 |
RAM: CPU (quantity, type, speed): OS version and patch level: Swap space: Disk slice sizes: |
Client delegate server |
OS type, version, patch level: RAM: CPU: Disk space: Monitor: Graphics card: |
Single-user GUI client |
OS type, version, patch level: RAM: CPU: Disk space: Monitor: Graphics card: |
Single-user GUI client |
RAM: CPU: OS type, version, patch level: Disk space: Monitor: Graphics card: |
Inventory base measures
You can use inventory base measures to:
-
track network-specific growth to provide a delta for any performance measures, for example, how long it takes 5 versus 15 client GUIs to list 1000 ports
Use the following sequence to create inventory base measures, for example, for access ports. You can modify the sequence to create additional inventory base measures for other objects.
-
Determine the type of object data for which you need to create inventory records, for example, access ports.
-
List the ports of all managed network devices using the client GUI manage equipment window or create an XML API request to generate the list.
-
Format the inventory for future processing, based on your inventory processing requirements.
-
Generate the inventory data, using the same listing and filtering criteria, on a weekly or monthly basis, as necessary to track changes to the network.
When new devices are added to the network on a regular basis, increase the inventory frequency.
-
Use the generated list to record the current inventory of network objects and as a baseline measure of performance.
For example, baseline the time required to generate a client GUI list of 1000 access ports.
When an access port list is later generated, record the time required to generate the list using 2000 ports. Ideally, it takes twice as long to list twice as many ports; if the ratio of listing time to number of ports is highly nonlinear, there may be scalability issues that require investigation.
Performance and scalability base measures
You can use the following performance and scalability base measures to:
-
record the system limit numbers and compare to the measurement data collected in your network
-
track network-specific growth to provide a delta for any performance measures on similarly-sized platforms, for example, how long it takes to discover 10 new devices versus 20 new devices
Table 21-5: Scalability base measures
Performance base measures
For networks, commonly available tools such as ping, which measures round trip time using ICMP, can be used to determine quantities such as packet loss and round trip delay. See the ping command information in this guide, and the NSP Troubleshooting Guide, for more information about performing the commands.
-
Packet loss is defined as the fraction of packets sent from a measurement agent to a test point for which the measurement agent does not receive an acknowledgement from the test point. Acknowledgements that do not arrive within a pre-defined round trip delay at the measurement agent are considered lost.
-
Round trip delay is defined as the interval between the time a measurement agent sends a packet to a test point and the time it receives acknowledgement that the packet was received by the test point.
You can baseline the packet loss results and round trip delay times for specific NMS LAN and network scenarios. Record those results for future baselines against regularly run packet loss and round trip delay tests.
Reachability base measures
System reachability is important in business-critical systems. Service reachability components are:
-
If so, is the service available for customer use? (service availability)
-
If not, how frequently and how long do service outages last? (service outage duration)
The types of measures and baselines necessary to ensure reachability and availability are network-dependent, and vary depending on the topology of the network, the networking technologies used to move data, and the types of equipment used.
NE reachability
A test point is reachable from a testing measurement agent when the agent can send packets to the test point and receive a response from the test point that the packet was received. The ping test and the OAM diagnostics using the NFM-P or device CLI can test reachability. Record the test results to create a measurement baseline.
These tests can be performed when you troubleshoot a customer service, or when you perform SLA tests before you enable a customer service.
Service availability
The network between a measurement agent and a test point is considered available at a given time when the measured packet loss rate and the round trip delays are both below pre-defined thresholds. The threshold values are dependent on network topology. The ping test and the OAM diagnostics using the NFM-P or CLI to a device can test service availability. Record the test results to create a measurement baseline.
Service outage duration
The duration of an outage is defined as the difference between the time a service becomes unavailable and the time it is restored. Time between outages is defined as the difference between the start times of two consecutive outages. Troubleshooters that resolve customer problems, or the data generated to resolve SLAs, can provide the baseline metrics to measure outages, and the time between outages. Record the information to create a measurement baseline.