Scaling guidelines for service assurance tests

Scheduled tests (STM)

NFM-P provides the ability to generate, manage and schedule STM tests within the network. This section provides guidelines that can be used to determine the extent to which STM tests can be scheduled and launched within a network.

There are a number of factors which will influence NFM-P’s ability to concurrently manage and schedule a large number of tests. NFM-P keeps track of how many tests are running concurrently. This is to limit the initiation of the tests, and the processing of the results without interfering with the system’s other functions.

To understand the STM guidelines, the following terminology is required:

Elemental Test: An OAM test to be sent to a router such as an LSP ping

Elemental Test Result: An OAM test result received from a network element

Accounting file Test: An OAM test that is initiated in the default manner, however, the test results are retrieved from the network element via FTP on a periodic basis.

Test Policy: A definition or configuration that tells NFM-P the specifics about how to generate a test. A test policy can contain multiple test definitions. The policies are used by test suites.

Test Suite: A collection of elemental tests that can be assigned to a specific schedule. There are three defined sections in which tests can be placed within a test suite: First run, Generated and Last run. The tests are executed in order by these sections. It is possible to configure the execution order of tests within the First Run and Last Run sections to be parallel or sequential. The tests in the Generated position are run by the system as concurrently as possible. If the Generated section contains tests from several different test definitions, then all the tests belonging to one definition will be executed before the tests of the next definition begin. Within a definition, the system will attempt to execute the tests as concurrently as possible. This is important to note, as a test suite containing a large number of tests in the Generated section (or in the First Run/Last Run sections set to parallel) may tax the system. Part of the increased stress placed on the system by concurrent tests is a result of the need for the system to use greater amounts of resources in order to initiate, wait for and process many tests concurrently. As well, tests that result in a large amount data to be returned from the routers will place increased demands on the NFM-P.

Schedule: A start time that can have a test suite or test suites assigned to it to produce scheduled tasks. When the schedule's start time is reached, the suite or suites assigned to it will commence. The schedule may be set to continuously repeat after a configurable period of time.

Scheduled Task: An instance of a test suite assigned to a schedule

Non -NE Schedulable STM Tests: NFM-P provides the ability to execute and process results for non NE schedulable tests. Non NE schedulable tests are elemental tests which are not persistently defined on network elements; rather, these tests are defined/configured from NFM-P per test execution. Elemental test results from non-NE schedulable tests are always regular (SNMP mediated) and share the same scale limits/considerations as regular scheduled STM tests.

Table 5-23: Maximum number of STM elemental test results

NFM-P platform

Maximum regular STM elemental test results (SNMP mediated schedulable/ non-NE schedulable) in a 15–minute period

Maximum accounting file STM elemental test results in a 15–minute period with results stored in the NFM-P database or NFM-P database and using logToFile

Maximum accounting file STM elemental test results in a 15–minute period using logToFile only

Distributed NFM-P configuration with minimum 8 CPU Core NFM-P server

15 000

1 500 000 1

1 500 000 1

Distributed NFM-P configuration

NOTE: It may be possible to achieve higher numbers depending on the NFM-P server activity and hardware platform

6000

22 500

60 000

Minimum Supported Collocated NFM-P configuration

NOTE: It may be possible to achieve higher numbers depending on the NFM-P server activity and hardware platform

3000

1500

15 000

Notes:
  1. may require a dedicated disk or striped disks for the xml_output partition

Guidelines for maximizing STM test execution

By default, NFM-P will only allow test suites with a combined weight of 80 000 to execute concurrently. The test suite weights are identified in the NFM-P GUI’s Test Suites List window. Running too many tests that start at the same time will cause the system to exceed the previously mentioned limit, and the test will be skipped. Ensuring the successful execution of as many STM tests as possible requires planning the schedules, the contents, and the configuration of the test suites. The following guidelines will assist in maximizing the number of tests that can be executed on your system:

Table 5-24: OAM test weight

Test type

Weight

Regular Elemental STM Test

10 per Test Packet

Accounting File Elemental STM Test

1

Accounting file STM test configuration

Accounting file collection of STM test results requires 7750 SR and 7450 ESS network elements that are version 7.0 R4 and above. To take advantage of accounting file STM test execution, the test policy must be configured to be NE schedulable with “Accounting file” selected. This will produce STM tests that will be executed on the network element, while the test results are collected by the NFM-P server by way of an accounting file in a similar way to accounting statistics. Accounting file STM test results are collected by the NFM-P server only.

NFM-P supports the use of logToFile for file accounting STM results. When using this method only for results, the number of tests that can be executed per 15 minute interval is increased. See Table 5-23, Maximum number of STM elemental test results for specific scaling limits. The logToFile method for file accounting STM results supports a maximum of two JMS clients.

Examples of STM test configuration

The following examples describe the configuration of STM tests on different network configurations.

Example 1:

Assume there is a network with 400 LSPs and that the objective is to perform LSP pings on each LSP as frequently as possible.  The following steps are to be followed:

  1. Create 4 test suites each containing 100 elemental LSP ping tests

  2. One at a time, execute each test suite and record the time each one took to complete. Assume that the longest time for executing one of the test suites is 5 minutes.

  3. Create a schedule that is ongoing and has a frequency of 15 minutes. This doubles the time taken for the longest test suite and ensures that the test will complete before it is executed again. Assign this schedule to the 4 test suites.

  4. Monitor the test suite results to ensure that they are completing. If the tests are not completing (for example getting marked as “skipped”), then increase the frequency time value of the schedule.

  5. In the above case, there are 200 elemental tests configured to be executed each 10 minutes.

Example 2:

Assume there are eight test suites (T1, T2, T3, T4, T5, T6, T7 and T8), each containing 50 elemental tests. Assume the test suites individually take 5 minutes to run. Also, assume the objective is to schedule them so that the guideline of having less than 200 concurrently running elemental tests is respected.

The recommended approach for scheduling these tests suites is as follows:

Factors impacting the number of elemental tests that can be executed in a given time frame

The following factors can impact the number of elemental tests that can be executed during a given time frame:

Possible consequences of exceeding the capacity of the system to perform tests

NFM-P will exhibit the following symptoms if the number of scheduled tests exceeds the system’s capacity:

Disk space requirements for STM test results

STM test results are stored in the tablespace DB partition. The STM database partitions start with a total size of 300MB of disk space. When the maximum number of test results is configured at 20 000 000 (maximum), the disk space requirement for the STM tests may increase by up to 80 GB. A larger tablespace partition should be considered.

The maximum number of test results stored in the database reflects the sum of the aggregate results, test results, and probe results.

Running 10 tests with 1 probe each versus 1 test with 10 probes consumes the same amount of disk space.

When using logToFile for accounting file STM test results, the maximum time-to-live on the disk is 24 hours. At the maximum collection rate of 1 500 000 test results per 15 minutes, the storage requirements on the NFM-P server in the xml_output directory is 600 GB per JMS client. The storage requirements are doubled if using the maximum number of JMS clients for file accounting STM results. The disk storage requirements can be decreased by using the compress option for logToFile but will result in increased CPU utilization on the NFM-P server.

Scaling guidelines for OAM PM test results

See the NSP NFM-P Classic Management User Guide for details on OAM PM test configuration and result retrieval.

The quantity of resources which are allocated to the retrieval and processing of OAM PM test results within the NFM-P server are set at the installation time and depend on the number of CPUs available to the NFM-P server software. The number of CPUs available to the NFM-P server depends on the number of CPUs on the station and whether the NFM-P database software is collocated with the NFM-P server software on the same station

The following tables provide the maximum number of OAM PM test results that can be retrieved and processed by the NFM-P server or NFM-P statistics auxiliary in various configurations.

Table 5-25: Maximum number of OAM PM test results processed by an NFM-P server

Number of CPU cores on the NFM-P server

Maximum number of OAM PM test results per 15-minute interval

Collocated configuration

Distributed configuration

6

100 000

200 000

8 or greater

200 000

400 000

Table 5-26: Maximum number of OAM PM test results processed by an NFM-P statistics auxiliary

Number of active NFM-P statistics auxiliaries

Maximum number of OAM PM test results per 15-minute interval

OAM PM test result collection with NFM-P database

OAM PM test result collection with single auxiliary database

OAM PM test result collection with three+ auxiliary database cluster

logToFile only

8 CPU cores, 32 GB RAM

12 CPU cores, 32 GB RAM

8 CPU cores, 32 GB RAM

12 CPU cores, 32 GB RAM

12 CPU cores, 32 GB RAM

1

10 000 000

10 000 000

5 000 000

20 000 000

20 000 000

2

10 000 000

10 000 000

5 000 000

40 000 000

40 000 000

3

10 000 000

10 000 000

5 000 000

60 000 000

60 000 000

The table below shows the retention that is achievable depending upon the total number of test results to retain and the database used to retain the records:

Table 5-27: Maximum OAM PM test result retention

Database to retain records

Total number of OAM PM test results to be stored in the database

Maximum number of retention intervals

NFM-P database

<40M

672

>40M

96

NSP auxiliary database

N/A

35,040