Application Assurance — Best Practices for ISA and Host IOM Overload Protection

This chapter provides information about Application Assurance best practices for ISA and host IOM overload protection.

Topics in this chapter include:

Applicability

The information and configuration in this chapter is based on SR OS Release 12.0.R4.

Overview

The multiservice integrated services adapter (MS-ISA) is a processing resource module installed on an ISA host IOM. This example describes the best practices for configuration and monitoring of the system to ensure proper engineering of the system resources involved in AA ISA capacity planning.

As shown in System packet datapath to AA ISA, traffic is diverted to an AA ISA by provisioning of an application profile (app-profile) for a subscriber or SAP service context. SR OS then automatically handles traffic diversion for both directions of traffic for that AA subscriber context, through one of the AA ISAs in the AA group where that app-profile is defined.

Figure 1. System packet datapath to AA ISA

The following elements in the SR OS node must be properly engineered for any given AA deployment. Each element is described in this section:

  1. ISA capacity cost and load balancing across ISAs.

  2. ISA host IOM network egress QoS. Host IOM egress network ports weighted-average shared buffer pool thresholds (within the egress QoS configuration for each group) are used for overload cut-through processing.

  3. ISA resources and statistics collection.

    • Flows

    • Traffic volume (bandwidth)

    • Subscribers

    • Flow setup rate

    • ISA overload cut-through

    • ISA default subscriber policies

ISA capacity planning approach

This example illustrates an approach to the configuration of the AA system to address these considerations:

  • IOM/ISA-AA network egress QoS configuration should be designed to treat the ISA as a network port with normal network port maximum delay (by MBS).

  • Within the ISA, fair access to the ISA-AA bandwidth and flow resources must be ensured: it is recommended that default application QoS policy (AQP) policy entries be configured limiting bandwidth and flow resources per AA subscriber.

  • Thresholds for SNMP alerts that indicate a high load on ISA processing should be configured: capacity cost, flow, bandwidth.

  • Capacity tracking in live deployments should be performed for parameters that can affect overload: flow setup rate, bandwidth, and subscriber-count per ISA.

  • Use of other scale related consumable AA resources against system maximum limits. This includes parameters such as statistics records, transit-ip table entries, and transit-prefix TCAM entries, which should be planned and periodically tracked. These limits will not affect overload of the ISA but may affect intended service operation.

  • For recommendations of the specific parameters to watch in a given deployment as well as the values of the system limits for a given release, contact your regional support organization.

AA overload and resource monitoring

Overload is a condition where the total packet processing requirements for traffic arriving on a given ISA exceeds the available resources, resulting in the host IOM egress buffers reaching a configured ‟overload” threshold. Above this threshold, the ISA can be configured to forward excess traffic (called overload cut-through). If cut-through is not enabled and the overload condition continues, the egress queue MBS threshold will eventually be reached, after which packets will be discarded. Even if overload cut-through is enabled, any egress traffic that exceeds the maximum bus capacity of the ISA queue discard threshold will be discarded.

ISA capacity overload events are supported within the system resource monitoring and logging capabilities if the traffic and resource load crosses any of the following high and low load thresholds on a per-ISA basis. Exceeding one of these thresholds does not in itself indicate an overload state.

  1. Host IOM egress network ports weighted-average shared buffer pool thresholds (within the egress QoS configuration for each AA group) are used for triggering and removing overload cut-through processing. Care should be taken in the configuration of these buffers, as the IOM flexpath has significant buffer capacity that can result in latency larger than the network SLA acceptable guidelines. A properly engineered configuration will have large enough buffering to not trigger ISA overload unnecessarily (due to normal bursts with a reasonable traffic load) but will not incur excessive latency prior to triggering the overload state.

  2. ISA capacity cost: if the capacity cost of all subscribers on the ISA exceeds a threshold, an event is raised but the overload condition is not set (unless other resources are exhausted). ISA overload or traffic cut-through does not occur simply if capacity cost is exceeded. It is used to capacity plan an intended load for the ISA, proportional to resource use per subscriber, in order to generate events prior to overload to allow appropriate action to mitigate the resource consumption (such as provisioning more ISAs).

  3. Flow table consumption (number of allocated flow resources in use): the flow table high-watermark threshold warnings are for proactive notification of a high load. The ISA will cut-through new flows when the ‟flow resources in use” is at the maximum flow limit. Reaching the flow limit does not generate backpressure to the IOM, nor is the ISA considered in an overload state. Flow usage thresholds are different from bit-rate/packet-rate/flow-setup-rate thresholds in that when the flow table high-watermark threshold is exceeded, the ISA will no longer be operating as application-aware for the flows with no context. The default subscriber policy is applied to traffic that required a flow record but was unable to allocate one, which is a similar behavior to overload cut-through.

    The following terms are used to describe flow resources:

    • Maximum flows: the maximum AA flow table size for a given release.

    • Flows: on the show screens, the ‟flows” field is an indication of the number of unique 5-tuple entries in the flow table. This includes active and inactive flows; inactive will age out of the table after a period of inactivity that is dependent on the protocol used.

    • Active flows: the number of flows with traffic in the current reporting interval.

    • Flow resources in use: the number of allocated flows in the flow table. This number is greater than the number of active flows, reflecting inactive flows and flows pre-allocated for some dynamic protocols (control + data channels) and for some UDP traffic.

  4. Traffic volume: traffic rate in bytes/sec and packets/sec is the dominant cause of ISA overload in most network scenarios, when the ISA is presented with more traffic than it can process. This results in the ISA internal ingress buffers reaching a threshold that causes backpressure to the IOM egress queues (toward the ISA), allowing the ISA to process the packets it already has. This internal backpressure mechanism is normal behavior, allowing burst tolerance at the IOM-to-ISA interface; thus backpressure is not in itself an indication of overload. Overload occurs when the bursts or the load of traffic is sustained long enough to reach the ISA host IOM network port egress weighted-average shared buffer threshold. The actual amount of traffic that can be passed through an ISA is dependent on the application traffic mix, flow density, and AA policy configurations and will vary by network type and by region. The bit-rate and packet-rate watermarks can be used to provide event notification when the traffic rates exceed planning expectations.

  5. Flow setup rate: this is generally proportional to total traffic volume, and as such can be a factor in ISA overload. The flow setup rate is the rate at which new flows are presented to the ISA, each resulting in additional tasks that are specific to flow state creation; thus the ISA has a sensitivity to flow setup rates as fewer cycles are available for other datapath tasks when the flow setup rate is high. In residential networks, flow setup rates of 3 k to 5 k flows/sec per Gbps of traffic are common. The flow setup watermarks can be used to provide event notification when the rate exceeds planning expectations.

ISA Overload Models

For an ISA overload strategy, there are two design options for configuring the overload behavior of the system:

  • Host IOM egress discards: in this model, the philosophy is to treat AA packet processing resources in the same way as a network interface (of somewhat variable capacity depending on the traffic characteristics). When too much traffic is presented to the ISA, it backpressures the host IOM egress, which will buffer packets. If the egress buffer thresholds are exceeded, the ISA will discard according to the egress QoS slope policy. This is configured by not enabling isa-overload-cut-through and use of appropriate egress QoS policies. Firewall or session filter deployments may use this model.

  • Overload cut-through: the ISA group can be enabled to cut-through some traffic if an overload event occurs, triggered when the IOM network port weighted-average queues depth exceeds the weighted-average shared high-watermark threshold. In this ISA state, some packets are cut-through from application analysis but retain subscriber context with the default subscriber policy applied. This mode of deployment is intended for situations where it is preferable to forward packets even if not identified by AA than to drop/discard the packet. For example, if AA is providing value-added services (VAS) such as In-Browser Notification (IBN), analytics, or traffic rate limiting, this would usually be the preferred model as the underlying service should be preserved even if capacity to provide the VAS is not available.

Note that even with overload cut-through enabled, there is a hardware-based maximum ISA throughput of approximately 11 Gbps for MS-ISA and 40 Gbps for MS-ISA2. If this is exceeded on a sustained basis, IOM egress discards may still occur.

Understanding Packet and Protocol Cut-Through

Traffic can be cut-through the ISA-AA card on a packet-by-packet basis, in which case packets do not go through AA identification and subscriber application policy. The conditions that trigger cut-through include:

  • Overload (IOM egress network port weighted-average shared buffer threshold): excess traffic bypasses all AA processing except for the default subscriber policy

  • Non-conformant IP packet: traffic bypasses all AA processing except IP protocol checks and the default subscriber policy. Optionally, these packets can be discarded in AA.

  • Flow table full: for new 5-tuples sent to the ISA, if the flow table is full, the packets are cut-through the ISA and only the default subscriber policy is applied.

Note:

The default subscriber policy is a set of AQP rules that apply AQP match criteria limited to Application Service Options (ASO), aa-sub, and traffic-direction starting with the first packet of a flow, with no match conditions based on AA identification (application, app-group, charging-group, IP header). Packets will be either denied_by_default_policy or cut_through_by_default_policy, depending on the policer action configuration in the AQP rules.

For cut-through traffic, no flow records exist but it is counted under per-subscriber protocol statistics as one of the following counters, depending on the case:

  • cut_through — Statistics for any packet that could not map to a flow, but that has a valid subscriber ID. This can be an error packet, fragmented out-of-order, no flow resource, invalid TCP flags, etc. This is the most important count for indicating overload cut-through, as it counts all traffic in overload cut-through mode (when the weighted-average threshold has been crossed).

  • denied_by_default_policy — Packets that are dropped due to a default policy with a flow-based policer (flow rate or flow count) with action discard.

  • cut_through_by_default_policy — Packets that failed to pass flow-based policers with an action of priority-mark.

An example of overload cut-through statistics in the CLI is shown below:

A:BNG# show application-assurance group 1 protocol count
===============================================================================
Application-Assurance Protocol Statistics
===============================================================================
Protocol                         Disc          Octets       Packets      Flows
-------------------------------------------------------------------------------
advanced_direct_connect            0%               0             0          0
aim                                0%               0             0          0
amazon_video                       0%               0             0          0
ares                               0%               0             0          0
bbm                                0%               0             0          0
betamax_voip                       0%               0             0          0
bgp                                0%               0             0          0
bittorrent                         0%       678428534       5322929    1036129
cccam                              0%               0             0          0
citrix_ica                         0%               0             0          0
citrix_ima                         0%               0             0          0
cnnlive                            0%               0             0          0
cups                               0%               0             0          0
cut_through                        0%      5299435739      10603771          0
cut_through_by_default_policy      0%               0             0          0
cvs                                0%               0             0          0
daap                               0%               0             0          0
dcerpc                             0%               0             0          0
denied_by_default_policy           0%               0             0          0

Configuration

This example illustrates a typical configuration of an SR OS node for AA for each of the configuration topics.

AA traffic load test environment

Application assurance identifies every byte and every packet of hundreds of real-world applications using per-flow stateful analysis techniques. It is a challenge to find test equipment that can accurately emulate full scale (10 Gbps to 40 Gbps) with traffic mixes and flow behaviors representing hundreds of thousands of end users with application clients across a range of devices. Some specialized stateful test equipment can emulate large traffic rates, but even the best will have equipment-specific patterns and behaviors not representative of live traffic. Therefore, the best scenario to engineer the AA overload configuration is by iteration in live deployments: setting an initial target and modifying the configuration based on ISA performance under load.

For a lab test of ISA throughput and loading, Nokia uses stateful test equipment which supports emulation of various service provider traffic mix profiles suitable for generating overload conditions; however, it is outside the scope of this document to configure AA throughput tests.

The operator should be aware that use of unrealistic, non-stateful traffic generators can result in a high level of unknown traffic, with the ISA performance impacted by continually trying to identify large numbers of packets of no real application type. This, combined with cut-through for invalid IP packets, can result in ISA overload and traffic cut-through (due to overload or invalid IP packets) at traffic levels not representative of actual ISA performance on real traffic.

ISA capacity cost and load balancing across ISAs

These AA group-level commands define the load balancing parameters within an ISA group.

*A:BNG# configure isa 
    application-assurance-group 1 aa-sub-scale residential create
        no description
        no fail-to-open
        isa-capacity-cost-high-threshold 304000
        isa-capacity-cost-low-threshold 272000
        partitions
        divert-fc be
        no shutdown
    exit

The following should be noted related to this configuration:

  • Up to 7 primary and 1 backup ISAs are allowed. If the AA services are considered ‟value added” and not part of a paid service, backups are usually not used because the ‟fail to fabric” capability keeps the underlying service running.

  • The default behavior in case of ISA failure is ‟no fail-to-open”, which means ‟fail-to-wire”; if an ISA fails, traffic is forwarded as if no divert was configured

  • Threshold for sending capacity-cost SNMP traps: the unit used for capacity cost is a variable defined in the network design; in this example, it is expressed in Mbps of the subscriber total BW UP+DOWN with a high watermark set to 7600 Mbps × 40 = 304000 (where 40 is an oversubscription ratio). The low watermark is equal to 6800 Mbps × 40 = 272000.

  • Partitions should always be enabled to configure additional policies in the future (for example, wifi/business)

  • divert-fc configuration applies to the AA group: in this example, FC BE Internet is the only diverted FC; this is typical for AA residential and WLan-GW deployments. For VPN services, typically all datapath FCs are diverted to AA.

ISA-AA host IOM - network egress shared memory and QoS

The amount of shared memory allocated per port, along with the network port egress QoS policy, determine the maximum delay for traffic diverted to Application Assurance.

This maximum network port delay is typically determined by the operator and must be used to define the proper QoS configuration to apply to the ISA-AA ports; this QoS configuration may be the same (typically) as what is applied to regular network ports on the SR OS node.

On the line cards there is shared network egress memory per ISA-AA port, with the ISA-AA is represented by two network ports on the host IOM:

  • ‟from-sub”: for traffic sent from the subscriber to the network

  • ‟to-sub”: for traffic sent from the network to the subscriber

    
    configure isa application-assurance-group 1 
                qos 
                    egress
                        from-subscriber
                            pool
                                slope-policy "default"
                                resv-cbs default
                            exit
                            queue-policy "network-facing-egress"
                            port-scheduler-policy "network-facing"
                        exit
                        to-subscriber
                            pool
                                slope-policy "default"
                                resv-cbs default
                            exit
                            queue-policy "network-facing-egress"
                            port-scheduler-policy "network-facing"
                        exit
                exit
                no shutdown
    

The amount of shared memory reserved for each egress network port is determined by the speed of the port (10 Gbps for MS-ISA and 40 Gbps for MS-ISA2) and the egr-percentage-of-rate ratio configuration.

MS-ISA uses by default 1000% and 500% of the rate respectively for to-sub and from-sub ports, while MS-ISA2 uses by default 100% for both to-sub and from-sub ports.

It is typically recommended that these values be adjusted when MS-ISA and a high-speed Ethernet MDA are mixed on the same IOM, because in this context the amount of shared memory allocated to the Ethernet MDA should be increased by reducing the MS-ISA network ports memory allocation ratio. If two MS-ISAs are installed on the same IOM, the system will by default allocate 50% of the network egress shared memory to each ISA. In addition, an operator may adjust these values in case the actual network-to-subscriber versus subscriber-to-network ratio is significantly different in the production network, in order to achieve the expected maximum tolerated network delay.

The operator can modify the egr-percentage-of-rate per port using the following command:

A:BNG# configure port 1/2/fm-sub 
A:BNG>config>port# info detail 
----------------------------------------------
        modify-buffer-allocation-rate
            egr-percentage-of-rate 500
        exit
----------------------------------------------
A:BNG# configure port 1/2/to-sub 
A:BNG>config>port# info detail 
----------------------------------------------
        modify-buffer-allocation-rate
            egr-percentage-of-rate 1000
        exit

Network egress scheduling/queuing priority is for all ISAs within a group defined at the AA ISA group level

An example below with ISA-AA and 2 x 10G Eth MDA:

7750# configure port <slot>/<isa-aa-mda>/fm-sub
    modify-buffer-allocation-rate
        egr-percentage-of-rate 65

7750# configure port <slot>/<isa-aa-mda>/to-sub
    modify-buffer-allocation-rate
        egr-percentage-of-rate 130

In this example, the configuration defines:

  • from-sub — Approximately 190 msec worth of buffer at 2500 Mbps.

  • to-sub— Approximately 190 msec worth of buffer at 5000 Mbps.

  • The buffer can be further refined from the network QoS policy.

For MS-ISA2, each MS-ISM flexpath will default the buffer allocation rate to 100%, which is a suitable value assuming that both modules in a slot are MS-ISA2 (which is the MS-ISM configuration), or that the I/O module has a similar traffic rate as the MS-ISA2 (which is also the case in the 10x10GE and 1x100GE versions of the MS-ISA2 line cards).

Configuring ISA resources and stats collection

The following are the key consumable resources in an AA ISA:

  • Flows

  • Bandwidth

  • Subscribers

  • Flow setup rate

The AA group should be configured with watermark thresholds where each ISA will generate SNMP events when resources reach this level.

  • Per-ISA-card resource usage watermarks trigger SNMP traps to the management system (5620 SAM)

  • The values defined below can be refined based on the network characteristics in term of flows and bandwidth per ISA after the initial deployment

    
    7750# configure application-assurance
    ----------------------------------------------
            flow-table-low-wmark 90
            flow-table-high-wmark 95
            flow-setup-high-wmark 66500
            flow-setup-low-wmark 63000
            bit-rate-high-wmark 7600
            bit-rate-low-wmark 6800
    

In this example, the usage SNMP watermarks are configured for:

  • Flow table: 95%/90% (maximum 4M flows on MS-ISA)

  • Flow setup rate: configured to 95%/90% (of maximum 70k fps on MS-ISA)

  • Bit rate/total diverted throughput

The show app-assure group status detail command is used to display basic ISA health status:

  • # aa-sub, active aa-sub, bitrate, flows in use, flow setup rate

  • statistics for all ISAs combined or per ISA

    
    A:BNG# show application-assurance group 1 status detail 
      ===============================================================================
      Application-Assurance Status
      ===============================================================================
      Last time change affecting status : 05/30/2014 17:18:34
      Number of Active ISAs             : 4
      Flows                             : 214007945881
      Flow Resources In Use             : 2955164
      AA Subs Created                   : 70567
      AA Subs Deleted                   : 10544
      AA Subs Modified                  : 0
      Seen IP Requests Sent             : 0
      Seen IP Requests Dropped          : 0
      -------------------------------------------------------------------------------
                                          Current    Average    Peak
      -------------------------------------------------------------------------------
      Active Flows                      : 2911508    2769454    4582522
      Flow Setup Rate (per second)      : 33923      29400      67865
      Traffic Rate (Mbps)               : 7620       7238       22628
      Packet Rate (per second)          : 1254138    1182571    3044376
      AA-Subs Downloaded                : 69887      66129      70567
      Active Subs                       : 23131      19737      38114
      -------------------------------------------------------------------------------
                                          Packets               Octets
      -------------------------------------------------------------------------------
      Diverted traffic                  : 7437950197613         5530634242355947
      Diverted discards                 : 0                     0
          Congestion                    : 0                     0
          Errors                        : 0                     N/A
      Entered ISA-AAs                   : 7437950180191         5530634229794634
      Buffered in ISA-AAs               : 22                    29849
      Discarded in ISA-AAs              : 97790                 47801217
          Policy                        : 0                     0
          Congestion                    : 0                     0
          Errors                        : 97790                 47801217
      Modified in ISA-AAs
          Packet size increased         : 0                     0
          Packet size decreased         : 0                     0
      Errors (policy bypass)            : 28283549              21160338635
      Exited ISA-AAs                    : 7437950082379         5530634181963568
      Returned discards                 : 0                     0
          Congestion                    : 0                     0
          Errors                        : 0                     N/A
      Returned traffic                  : 7437950054070         5530634162337570
      ===============================================================================
    

This can also be run on a per-ISA basis:

show application-assurance group 1 status isa <slot/port> detail                          

Note that for MS-ISA2, there is a maximum AA packet rate of 7 M pps; under most known traffic mix scenarios, the ISA should be safely below this packet rate when at maximum bandwidth throughput. However, it is worth periodically checking this value, because if the maximum packet rate is exceeded, and overload cut-through will result. (For MS-ISA, the maximum packet rate supported is high enough to not be feasible with realistic application-based traffic mixes).

The ISA aa-performance record should always be enabled in a network for capacity planning purposes in order to properly plan when to add new ISA cards if required and to monitor the network health:


*A:BNG>config>isa# info 
----------------------------------------------
        application-assurance-group 1 aa-sub-scale residential create
            no description
            primary <slot/port> 
            backup <slot/port>
            no fail-to-open
            isa-capacity-cost-high-threshold 304000
            isa-capacity-cost-low-threshold 272000
            partitions
            statistics
                performance
                    accounting-policy 7
                    collect-stats
                exit
            exit
            divert-fc be
            no shutdown
        exit

The commands highlighted in bold above will export information on the total traffic load and resource utilization of the ISA card:

  • Flows — active flows, setup rates, resource allocation

  • Traffic rates — bandwidth, packets

  • Subscribers — active, configured, statistics resource allocation in use

The AA statistics collection configuration refers to accounting policies that are also defined in the SR OS node:

*A:BNG>config# log
    file-id 7 
        description "ISA Performance Stats"
        location cf2: 
        rollover 15 retention 12 
    exit 
    accounting-policy 7 
        description "ISA Performance Stats‟
        collection-interval 15 
        record aa-performance 
        to file 7 
        no shutdown 
    exit 

From the AA performance record the following fields in Tracking ISA load in the reporting interval can be used as to tracking ISA load in the reporting interval (typically a 15 to 60 minute period):

Table 1. Tracking ISA load in the reporting interval

Record name

Type

Description

Load planning use

dco

cumulative

octets discarded due to congestion in MDA

Should be 0; ISA internal congestion

dcp

cumulative

packets discarded due to congestion in MDA

Should be 0; ISA internal congestion

dpo

cumulative

octets discarded due to policy in MDA

Not related to load planning

dpp

cumulative

packets discarded due to policy in MDA

Not related to load planning

pbo

cumulative

octets policy bypass

Not used. Traffic was for an invalid subscriber and the group was "no fail-to-open"

pbp

cumulative

packets policy bypass

Not used. Traffic was for an invalid subscriber and the group was "no fail-to-open"

nfl

cumulative

number of flows

informative

caf

intervalized

current active flows

informative

aaf

intervalized

average active flows

informative

paf

intervalized

peak active flows

Check vs max

cfr

intervalized

current flow setup rate

informative

afr

intervalized

average flow setup rate

Check meets expected norms; increasing over time increases load

pfr

intervalized

peak flow setup rate

informative

ctr

intervalized

current traffic rate

informative

atr

intervalized

average traffic rate

Check meets expected norms; increasing over time increases load

ptr

intervalized

peak traffic rate

Check vs max

cpr

intervalized

current packet rate

informative

apr

intervalized

average packet rate

informative

ppr

intervalized

peak packet rate

informative

cds

intervalized

current diverted subscribers

informative

ads

intervalized

average diverted subscribers

informative

pds

intervalized

peak diverted subscribers

Check vs max and expected norms; increasing over time increases load

rfi

intervalized

flows in use

Check vs max and expected norms; increasing over time increases load

rcc

cumulative

ISA capacity cost

Check meets expected norms; increasing over time increases load

The intended deployment model is for this statistic record to be collected by a centralized service-aware management system along with all other AA records and to be stored in a reporting and analysis management database for subsequent analytics purposes, such as trending charts or setting thresholds of key values. It is recommended that a CRON script be used to export the AA performance record to a storage server for post processing if no reporting and analysis management tool is deployed:

  • If no reporting and analysis management tool is deployed in the network, it is possible to automatically collect the XML accounting files and provide high-level reporting through an XML-to-CSV conversion.

  • The simplest approach is to configure a CRON script on the SR OS node to automatically retrieve the CF accounting file (alternatively, any other scripting mechanism with an interval smaller than the retention period can be used)

  • It is recommended that the rollover interval of the file-id policy be modified to 6H or above in order to collect fewer files while keeping the same collection interval.

    *A:BNG# file type cf2:/script 
    file copy cf2:/act/*.gz ftp://login:password@IP-ADDRESS/acct/router1/
     
    *A:BNG>config>cron# info 
    ----------------------------------------------
            script "test-ftp-act"
                location "cf2:/script"
                no shutdown 
            exit
            action "cron1"
                results "ftp://login:password@IP-ADDRESS/results/router1-result.log"
                script "test-ftp-act"
                no shutdown 
            exit
            schedule "schedule1"
                interval 36000
                action "cron1"
                no shutdown 
            exit
    

The schedule : interval 36000 is in seconds (10 hours).

With this XML to CSV export mechanism, a spreadsheet can be used by the network engineer to periodically track the ISA resource utilization.

ISA overload cut-through

The system can be configured to react to overload based on the weighted-average (WA) queue depth of the shared network port buffer pool from-sub and to-sub. Overload cut-through is typically recommended for use of AA for value-added services where, in the event of overload, the preference is for the ISA to continue to pass packets without AA processing. However, firewall use cases will prefer to drop excess traffic in the event of overload, in which case overload cut-through may not be desired.

In addition to triggering an alarm, further packets sent to the ISA after the WA high-watermark threshold is reached are cut-through immediately by the ISA card without application identification or subscriber policy processing, if the isa-overload-cut-through command is enabled.

The WA queue depth is typically configured based on the maximum tolerated delay for the service diverted and the amount of shared buffer space allocated from the IOM.

AA deployment recommended settings:

  • high watermark — 33% of the maximum MBS for all diverted network queues

  • low watermark — 5% of the maximum MBS for all diverted network queues

The recommended high and low watermarks assume that the sum of the network port egress queues MBS size is 100% of the shared buffer. If this network queue maximum size is further reduced in the network QoS policy, the watermark values must be adapted proportionally; for example, if the total MBS size cannot exceed 50% of the shared buffer, then the watermark values would be divided by 2: the high watermark = 33% / 2 = 16%, the low watermark = 5% / 2 = 2%. Adjusting the MBS and the wa-shared-high-wmark and wa-shared-low-wmark values proportionately ensures that the MBS point (after which discards occur) is above the WA shared high-watermark threshold; otherwise, the ISA will not ever overload if MBS discards are occurring first.

A:BNG# configure isa application-assurance-group 1
            isa-overload-cut-through
            qos 
                egress
                    from-subscriber
                        wa-shared-high-wmark 16
                        wa-shared-low-wmark 2
                    exit
                    to-subscriber
                        wa-shared-high-wmark 16
                        wa-shared-low-wmark 2
                    exit
                exit
            exit

The show isa group commands can be used to verify that overload cut-though is enabled.

*A:BNG>show isa application-assurance-group 1
=============================================================
ISA Application-assurance-groups
=============================================================
ISA-AA Group Index          : 1
Description                 : (Not Specified)
Subscriber Scale            : residential
WLAN GW Group Index         : N/A
Primary ISA-AA              : 1/2 up/active
Backup ISA-AA               : 2/1 down
Last Active change          : 07/02/2014 12:17:45
Admin State                 : Up
Oper State                  : Up
Diverted FCs                : be
Fail to mode                : fail-to-wire
Partitions                  : enabled
QoS
  Egress from subscriber
    Pool                    : default
      Reserved Cbs          : default
      Slope Policy          : default
    Queue Policy            : default
    Scheduler Policy        :
  Egress to subscriber
    Pool                    : default
      Reserved Cbs          : default
      Slope Policy          : default
    Queue Policy            : default
    Scheduler Policy        :
Capacity Cost
    High Threshold          : 4294967295
    Low Threshold           : 0
Overload Cut Through        : enabled
Transit Prefix
    Max IPv4 entries        : 0
    Max IPv6 entries        : 0
    Max IPv6 remote entries : 0
HTTP Enrichment
    Max Packet Size         : 1500 octets
========================================================================

To monitor the load status of an ISA, enter the following CLI command.

*A:BNG>show application-assurance group 1 status isa 5/1 cpu 
==========================================
Application-Assurance ISA CPU Utilization
(Test time 993791 uSec)
==========================================
Management CPU Usage
------------------------------------------
Name               CPU Time    CPU Usage
                  (uSec)
------------------------------------------
System               14277         1.43%
Management           61101         6.15%
Statistics           69850         7.02%
Idle                848563        85.39%
==========================================
Datapath CPU Usage
------------------------------------------
Name              CPU Time     CPU Usage
                  (uSec)
------------------------------------------
System               14277         1.43%
Packet Processing    61101         6.15%
Application ID       69850         7.02%
Idle                848563        85.39%

Additionally, the system log files can be used to examine the AA overload history to determine when the overload state was entered and exited. It can be helpful to send AA events to a separate log using the following configuration:

    log 
        filter 45 
            default-action drop
            entry 10 
                action forward
                match
                    application eq "application_assurance"
                exit 
            exit 
        exit 
        log-id 45 
            description "application-assurance log"
            filter 45 
            from main 
            to memory 500
        exit 

The log files can then be examined to see if overload has occurred, and how frequently. If overload occurs with any regularity, it is a situation that should be addressed. Below is an example of a log file showing AA overload:

A:BNG# show log log-id 45 
===============================================================================
Event Log 45
===============================================================================
Description : application-assurance log
warning: 13 events dropped from log
Memory Log contents  [size=500   next event=16  (not wrapped)]

15 2014/08/14 17:00:32.66 EST WARNING: APPLICATION_ASSURANCE #4433 Base 
"ISA AA Group 1 MDA 5/1 exiting overload cut through processing."

14 2014/08/14 17:00:32.55 EST WARNING: APPLICATION_ASSURANCE #4431 Base 
"ISA-AA group 1 MDA 5/1 wa-shared buffer use is less than or equal to 1% in the to-subscriber direction or corresponding tmnxBsxIsaAaGrpToSbWaSBufOvld notification has been disabled."

13 2014/08/14 17:00:32.06 EST WARNING: APPLICATION_ASSURANCE #4432 Base 
"ISA AA Group 1 MDA 5/1 entering overload cut through processing."

12 2014/08/14 17:00:32.05 EST WARNING: APPLICATION_ASSURANCE #4430 Base 
"ISA-AA group 1 MDA 5/1 wa-shared buffer use is greater than or equal to 35% in the to-subscriber direction."

The primary indicator to look at in CLI statistics for ISA load indication is datapath CPU usage. Regardless of the configuration and traffic profiles in use, datapath CPU usage gives a consistent indication of whether the ISA is under heavy load (the cause of overload is the inability of the ISA to perform more tasks). The idle datapath time is not proportionate to bandwidth throughput, but if idle datapath CPU usage is under 5%, this indicates an approaching maximum processing load.

At an average datapath use of 95-100% (less than 5% idle) the ISA is creating latency and backpressuring the host IOM egress. It is the best way to know how close to overload the ISA has been. Attempting to examine data throughput statistics such as bit rate, flow setup rate and packet rate to predict overload is not recommended, as these are quite variable under normal circumstances and are not directly correlated to overload. Once in overload, the data statistics (volume, setup rate, and so on) are useful for determining what threshold traps to put in place for the future, but the needed thresholds will always be specific to the live deployment traffic mix and policy configuration.

Below is an example of the status for an ISA that is fully loaded but not yet in overload:

*A:BNG>show application-assurance group 1 status isa 5/1 cpu 
==========================================
Application-Assurance ISA CPU Utilization
==========================================

----------------------------------------------
Management CPU Usage (Test time 999636 uSec)
----------------------------------------------
Name                  CPU Time     CPU Usage
                      (uSec)     
----------------------------------------------
System                    1540         0.15%
Management                  14        ~0.00%
Statistics              643955        64.42%
ICAP Client                603         0.06%
Idle                    353524        35.37%
----------------------------------------------

----------------------------------------------
Datapath CPU Usage   (Test time 999735 uSec)
----------------------------------------------
Name                  CPU Time     CPU Usage
                      (uSec)     
----------------------------------------------
System                  188374        18.84%
Packet Processing       534203        53.43%
Application ID          277158        27.72%
Idle                         0         0.00%
----------------------------------------------

In this example, 0% idle datapath CPU means the ISA is fully used. When the Datapath CPU Usage Idle average is in the 5-10% range consistently, the ISA should be considered ‟full”; to add new subscribers, more ISAs are required.

If the excessive traffic condition persists, backpressure from the ISA to the IOM will buffer packets in the egress buffers, and when the egress MBS is exceeded, the ISA host IOM will indicate diverted discards due to congestion if cut-through is not enabled:

*A:BNG>show application-assurance group 1 status detail      
===============================================================================
Application-Assurance Status
===============================================================================
Last time change affecting status : 08/12/2014 13:16:15
Number of Active ISAs             : 1
Flows                             : 235754165
Flow Resources In Use             : 12000000
AA Subs Created                   : 14224
AA Subs Deleted                   : 0
AA Subs Modified                  : 1
Seen IP Requests Sent             : 0
Seen IP Requests Dropped          : 0
-------------------------------------------------------------------------------
                                    Current    Average    Peak
-------------------------------------------------------------------------------
Active Flows                      : 8452434    3786948    10632607
Flow Setup Rate (per second)      : 246578     65104      298677
Traffic Rate (Mbps)               : 33702      13229      35813
Packet Rate (per second)          : 6847697    2466118    6945936
AA-Subs Downloaded                : 14224      13710      14224
Active Subs                       : 14224      9934       14224
-------------------------------------------------------------------------------
                                    Packets               Octets
-------------------------------------------------------------------------------
Diverted traffic                  : 8924242848            5983284952320
Diverted discards                 : 752486                729147667
    Congestion                    : 752486                729147667
    Errors                        : 0                     N/A
Entered ISA-AAs                   : 8923417360            5982508976617
Buffered in ISA-AAs               : 57                    19277
Discarded in ISA-AAs              : 0                     0
    Policy                        : 0                     0
    Congestion                    : 0                     0
    Errors                        : 0                     0
Modified in ISA-AAs
    Packet size increased         : 0                     0
    Packet size decreased         : 0                     0
Errors (policy bypass)            : 0                     0
Exited ISA-AAs                    : 8923417303            5982508957340
Returned discards                 : 0                     0
    Congestion                    : 0                     0
    Errors                        : 0                     N/A
Returned traffic                  : 8923285123            5982432640249
===============================================================================

ISA default subscriber policy

Default subscriber policy — AQP with match criteria not using App-ID or 5-tuple. Match only includes traffic direction and/or ASO characteristic and/or subscriber name.

It is recommended that each ISA be configured with some default subscriber policies that get applied to all subscribers at all times, independent of application flow ID, and even when an ISA is in overload cut-through. These policies protect the ISA resources and provide fairness of resource allocation between subscribers by limiting the ISA resources that can be consumed by a single subscriber. A starting point for the recommended policers is (in all cases, network-specific tuning is recommended):

  • Per-subscriber flow rate policer: value more than expected maximum peak per-subscriber rate for active subscribers. The policer protects one subscriber from attacking the network with an excessive flow rate and affecting ISA flow rate resources used by other customers. A typical rate for residential networks could be 100 fps per subscriber.

  • Per-subscriber flow count policer: value more than expected maximum per-subscriber flow count for active subscribers. The policer protects one subscriber from consuming excessive flow counts and affecting ISA flow resources used by other customers.

  • Downstream bandwidth per subscriber: to a value more than the maximum rate supported by the service, or to less than the maximum per-subscriber capability of the ISA, whichever is lower. For fixed networks, several default policer rates are recommended using a per-sub ASO value for low, medium and large rate ranges set at a rate related to the subscriber access speed. For example, for an FTTH service the per-sub policers could be set at 3 value ranges: below 25Mbps, with another at 100Mbps sub policer for services between 25Mbps and 100Mbps, and another sub-policer for rates between 100Mbps and 1Gbps. The settings for a mobile 3G network rate may be 1Mbps and in an LTE network the rate may be vhcc10Mbs.

For a CLI example of a default subscriber policy, see Application Assurance — App-Profile, ASO and Control Policies.

Conclusion

Any deployment of Application Assurance should include careful capacity planning of the ISA resources, with an appropriate ISA overload strategy, whether for overload cut-through to keep excess traffic flowing, or with a discard policy engineered in the host IOM egress QoS policies.

ISA resource use should be monitored via appropriately configured resource thresholds, events, log files, XML records and show screens to ensure that sufficient ISA resources are available as required.