What is NFM-P system maintenance?

Introduction

The implementation of a regular maintenance schedule is recommended in order to:

  • prevent downtime caused by software, platform, or network failure

  • enable maximum system performance

NFM-P system maintenance begins with the establishment of base measures against which to evaluate the system functionality and correct any performance or connectivity issues.

NFM-P OLC states

You can put NEs in maintenance mode using OLC states, as described in Setting NFM-P OLC states.

NFM-P base measures

Maintenance base measures can be used by NOC operations or engineering staff who are responsible for maintenance issues to evaluate the activity and performance of network components, for example, client GUI response times when listing equipment.

The data from a series of base measures can be used, over time, to track performance trends. For example, if there are reports that client GUI response times for listing equipment degrades over time, you can use the base measures to determine how much performance has degraded. The procedures in this guide can help narrow the search for the cause of performance degradation.

It is recommended to do the following:

  • Determine the types of base measures required for your network.

  • Record base-measure data.

  • Regularly collect system information and compare the information with the base measure data.

This section provides base measure information for:

  • platform—to ensure system sizes are tracked

  • performance and scalability—to categorize system limitations as a baseline against NMS response times

  • inventory counts—to generate inventory lists for storage and post-processing

  • reachability—to ensure that customer services are available

Establishing base measures

Base measures can be affected by issues that are beyond the scope of this guide, including:

  • network topology design

  • NOC or operations area LAN design

The NFM-P service test manager (STM) provides the ability to group OAM diagnostic tests into test suites that you can run as scheduled tasks. You can customize a test suite to your network topology and execute the test suite to establish baseline performance information. You can retain the test suite, modify it to accommodate network topology changes, and execute the test suite to establish new base measures as required. Scheduled execution of the test suite and regular review of the results may reveal deviations from the baseline. See the NSP NFM-P User Guide for information about using the STM and creating scheduled tasks.

Platform base measures

You can use platform base measures to:

  • record the details of the platform configuration

  • track network-specific growth to provide a delta for performance measures, for example, how long it takes to list 1000 ports on the current station compared to 10 000 ports on the same station, or on a smaller or larger station

Table 21-4: Platform base data

Component

Platform information

Main server 1

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Main database 1

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Main server 2

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Disk slices:

Main database 2

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Database disk file systems:

Disk slice sizes:

Auxiliary server 1

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Disk slice sizes:

Auxiliary server 2

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Disk slice sizes:

Auxiliary server 1

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Disk slice sizes:

Auxiliary server 2

RAM:

CPU (quantity, type, speed):

OS version and patch level:

Swap space:

Disk slice sizes:

Client delegate server

OS type, version, patch level:

RAM:

CPU:

Disk space:

Monitor:

Graphics card:

Single-user GUI client

OS type, version, patch level:

RAM:

CPU:

Disk space:

Monitor:

Graphics card:

Single-user GUI client

RAM:

CPU:

OS type, version, patch level:

Disk space:

Monitor:

Graphics card:

Inventory base measures

You can use inventory base measures to:

  • create lists of network objects for future processing

  • track network-specific growth to provide a delta for any performance measures, for example, how long it takes 5 versus 15 client GUIs to list 1000 ports

Use the following sequence to create inventory base measures, for example, for access ports. You can modify the sequence to create additional inventory base measures for other objects.

  1. Determine the type of object data for which you need to create inventory records, for example, access ports.

  2. List the ports of all managed network devices using the client GUI manage equipment window or create an XML API request to generate the list.

  3. Format the inventory for future processing, based on your inventory processing requirements.

  4. Generate the inventory data, using the same listing and filtering criteria, on a weekly or monthly basis, as necessary to track changes to the network.

    When new devices are added to the network on a regular basis, increase the inventory frequency.

  5. Use the generated list to record the current inventory of network objects and as a baseline measure of performance.

    For example, baseline the time required to generate a client GUI list of 1000 access ports.

    When an access port list is later generated, record the time required to generate the list using 2000 ports. Ideally, it takes twice as long to list twice as many ports; if the ratio of listing time to number of ports is highly nonlinear, there may be scalability issues that require investigation.

Performance and scalability base measures

You can use the following performance and scalability base measures to:

  • record the system limit numbers and compare to the measurement data collected in your network

  • track network-specific growth to provide a delta for any performance measures on similarly-sized platforms, for example, how long it takes to discover 10 new devices versus 20 new devices

  • quantify user perceptions of performance

Table 21-5: Scalability base measures

Type of base measure

System limits

Expected response time

Network base measure response time

Additional information

Total devices managed

See the appropriate NSP NFM-P Release Description and NSP Planning Guide for information about release-specific system limits.

The client GUI is operational XX seconds after launching.

The time to open icons in the Equipment navigation tree increases depending on the number of configured MDAs.

Total services

  • XML API configuration of 300 VLL services in X min

  • XML API configuration of 100 VPLS services with 3 sites and one SAP in 5 min

The complexity of the service configuration affects response time. For example, adding additional SAPs to a VPLS increases provisioning time.

Outstanding alarms

The client GUI is able to retrieve and display XX 000 alarms in the dynamic alarm list during startup.

Client GUIs for each server

Open a configuration form using the client GUI in X amount of time.

Measure X against a constant platform size over time

Device discovery

Discover one additional device with an IP address in the X.X.X.1 to 255 range in less than 1 min.

Performance base measures

For networks, commonly available tools such as ping, which measures round trip time using ICMP, can be used to determine quantities such as packet loss and round trip delay. See the ping command information in this guide, and the NSP Troubleshooting Guide, for more information about performing the commands.

  • Packet loss is defined as the fraction of packets sent from a measurement agent to a test point for which the measurement agent does not receive an acknowledgement from the test point. Acknowledgements that do not arrive within a predefined round trip delay at the measurement agent are considered lost.

  • Round trip delay is defined as the interval between the time a measurement agent sends a packet to a test point and the time it receives acknowledgement that the packet was received by the test point.

You can baseline the packet loss results and round trip delay times for specific NMS LAN and network scenarios. Record those results for future baselines against regularly run packet loss and round trip delay tests.

Reachability base measures

System reachability is important in business-critical systems. Service reachability components are:

  • Can the customer reach the service? (reachability)

  • If so, is the service available for customer use? (service availability)

  • If not, how frequently and how long do service outages last? (service outage duration)

The types of measures and baselines necessary to ensure reachability and availability are network-dependent, and vary depending on the topology of the network, the networking technologies used to move data, and the types of equipment used.

NE reachability

A test point is reachable from a testing measurement agent when the agent can send packets to the test point and receive a response from the test point that the packet was received. The ping test and the OAM diagnostics using the NFM-P or device CLI can test reachability. Record the test results to create a measurement baseline.

These tests can be performed when you troubleshoot a customer service, or when you perform SLA tests before you enable a customer service.

Service availability

The network between a measurement agent and a test point is considered available at a given time when the measured packet loss rate and the round trip delays are both below predefined thresholds. The threshold values are dependent on network topology. The ping test and the OAM diagnostics using the NFM-P or CLI to a device can test service availability. Record the test results to create a measurement baseline.

Service outage duration

The duration of an outage is defined as the difference between the time a service becomes unavailable and the time it is restored. Time between outages is defined as the difference between the start times of two consecutive outages. Troubleshooters that resolve customer problems, or the data generated to resolve SLAs, can provide the baseline metrics to measure outages, and the time between outages. Record the information to create a measurement baseline.