What is a baseline?
Baselines
A baseline provides the logic for the collection of baseline statistics, and for the detection of anomalies, through detecting a historical trend and generating expected values based on the trend.
An anomaly is a value that deviates enough from a baseline’s expected value to be flagged for attention. You can configure anomaly thresholds as part of baseline creation.
Baselines and anomaly detection are included with NSP Baseline Analytics.
When you create baselines, you specify a resource or group of resources to collect a set of statistics over a defined time window. Information collected during that window is used to calculate a data point for the baseline. You also define a season, which is the length of time statistics need to be measured to assess trends. For example, to assess network traffic, you could set up a 15-minute window with a one-week season, which provides values calculated every 15 minutes over a one-week period.
Creation of a baseline creates a baseline subscription to collect the required data.
Note: Baseline subscriptions and telemetry subscriptions are separate. A baseline cannot be generated from data collected by a telemetry subscription.
On-demand NFM-P statistics cannot be used to create baselines.
A baseline consists of the following components. The components appear in the Create and Edit forms:
Baselines are created on a per-resource basis. A resource is an entity that can collect the desired statistics. In the Create Baselines form, configure the required parameters and choose the resources to collect the statistics.
If the NE is managed using MDM, configuration of a baseline initiates statistics collection. If the NE is managed by NFM-P, statistics collection must be configured on the NFM-P and the resource must already be collecting the desired statistics for a baseline to be created.
Note: Baseline Analytics is different from the NSP Analytics application.
Baseline Analytics provides near-real-time baseline and anomaly detection from telemetry counters, for example, received octets for the /telemetry:base/interfaces/interface telemetry type.
The Analytics application computes a baseline for data configured for reporting, for example, utilization and throughput for a port in a Port LAG Details report, or bandwidth and data for an application group in a Router Level Usage Summary report with Baseline. See the Analytics Report Catalog and the NSP User Guide for more information about Analytics.
General parameters
The general parameters of a baseline include the following:
-
Add an optional description for filtering on the Baselines view.
-
For MDM-managed NEs, this represents the interval at which to collect the statistics, for example, every 30 seconds. For NEs managed by the NFM-P, this value is ignored and statistics are collected according to the settings configured in the NFM-P.
-
A season is the length of time statistics must be collected for a pattern to be seen. For example, for network traffic you can expect the data pattern to repeat on a weekly basis.
-
Window duration is the size of the data bucket for telemetry calculation. For example, a counter calculates the change between the first and last values taken during the window. The calculation used depends on the counter type parameter in the Filters & Counters panel.
-
Admin State and Training Status
These parameters are enabled by default when a baseline is created. They can be changed in the Edit form.
-
If the Admin State of a baseline is Enabled, NSP is monitoring the statistics.
-
If the Training Status is Active, NSP is incorporating new information into the baseline’s model. If the Training Status is Paused, future anomalies will be detected against the expected values that are already calculated.
If you are monitoring for error counters, such as packet loss, you can pause learning after a season with no errors, which sets the expected number of errors to zero, while continuing to monitor.
-
For example, if you create a baseline and set the Collection Interval to 30, the Season to 1 week, and the Window Duration to 15 minutes, the baseline subscription collects the statistics values every 30 seconds, calculates a baseline data point every 15 minutes, and assesses trends based on one week of data.
Filter & Counters parameters
As you proceed through baseline creation, more parameters become available on the UI. The Filter & Counters parameters declare the telemetry values to be collected, the counter types, and the resources of interest.
After you select a telemetry type, the COUNTERS button becomes available.
You can configure one of the following counter types:
-
Counter: a counter takes input counter values and calculates the change in value over the window. For example, if the counter represents the number of transmitted octets and windows are 15 minutes, a counter baseline value is the number of octets transmitted over 15 minutes.
-
Gauge: a gauge takes input values and calculates the mean value over the window. For example, if the input value is octets per second over a 30 s period and windows are 15 minutes, a gauge baseline value is the mean octets per second over 15 minutes.
An example of a gauge is CPU usage; it is a bounded value between 1 and 100%.
-
Sampled: a sampled baseline takes sampled values and calculates the sample mean value over the window. Sampled values represent the value at the exact time the sample was taken, not the value since the last sample was taken. For example, if CPU % is sampled every 2 minutes and windows are 15 minutes, a sampled baseline value is the sampled mean over the samples collected in the 15 minutes.
An example of a sampled value is latency.
Configure an object filter as needed to filter the available resources; see How do object filters work?.
When at least one counter is added and a counter type is specified, the VERIFY RESOURCES button becomes available.
Detectors
A detector defines the rules for anomaly detection. The detector rule provides an acceptable range of expected values. If a detected value exceeds the range, it is marked anomalous.
Anomaly detection is optional.
A detector rule is composed of the following:
-
algorithm — the formula to use to compare the expected and measured values
-
evaluate what — value, rate, or bandwidth
The measured values may be converted to a rate or bandwidth to perform the evaluation:
The Comparison and Threshold parameters define the range of acceptable values. For example, a rule could state that a value with an absolute Z-score greater than 2 is an anomaly.
Algorithms
You can define a rule based on an algorithm.
The following algorithms are suitable for most purposes:
-
This refers to the Z-score (number of standard deviations) of the measured value against the expected values. In this case, the expected value is the mean.
Formula: (measured - expected) / stddev
-
This refers to the absolute value of the Z-score of the measured value against the expected values. In this case, the expected value is the mean.
Formula: |(measured - expected) / stddev|
The Z-score algorithms are useful because they incorporate the standard deviation. In addition to recording how far the current value is from the mean, the algorithm also factors in the variability of the values. This can be very important when deciding if a value is anomalous. If your values are highly variable, that is, the standard deviation is high, it is important to choose a Z-score algorithm.
You can also use one of the following:
-
This is the relative difference using the absolute value of the arithmetic mean of the measured and expected values.
Formula: |measured - expected| / (|measured + expected| * 0.5)
This algorithm could be suitable if the standard deviation is very small, that is, if there is very little variation in the values.
-
This is the relative change (including the positive or negative sign) between the measured and expected values.
Formula: (measured - expected) / |expected|
-
This is the relative change (with no sign) between the measured and expected values.
Formula: |measured - expected| / |expected|
-
This is the change of the measured and expected values over the absolute value of the arithmetic mean.
Formula: (measured - expected) / (|measured + expected| * 0.5)
-
This is a score that becomes more sensitive as the measured or expected value approaches +/–100. This detector algorithm works well with percentages although it may have use with other types of values.
Formula: (sign(measured - expected) * (|measured - expected| + max(measured,expected))) / 200
Baseline charts
A baseline chart plots captured telemetry data for a specified period, along with baseline expected values and the range relative to the expected value that is considered normal by the anomaly detectors, that is, the range of values that are not anomalies.
Anomalies are indicated where they are detected.
Note: A baseline must be trained for at least one season before expected values can be charted. Detection of anomalies requires one or more seasons depending on the detector.
Baseline and anomaly charts and the list of anomalies are available in Data Collection and Analysis Visualizations.
To open Data Collection and Analysis Visualizations for a baseline, choose the baseline and click (Table row actions), Open in Data Collection and Analysis Visualizations. The view opens to a New Chart form with the baseline selected.
RESTCONF APIs are available for Baseline Analytics; see the Network Infrastructure Management API documentation on the Network Developer Portal.
To plot a baseline chart, see How do I plot a baseline chart?.
To chart an anomaly, see How do I chart an anomaly?.
Baseline Analytics data storage
Baseline data is stored in Postgres, unless there is an auxiliary database enabled, in which case all collected data is stored in the auxiliary database.
The following data is stored:
-
statistics data collected during the configured window; see General parameters
By default, data is stored in Postgres for 35 days and in the auxiliary database for 90 days. These values can be changed using the RESTCONF API or by updating the age-out policy; see How do I edit an age-out policy?.