QoS configuration
QoS configuration on SR Linux involves the following tasks:
DSCP classifier policy configuration for input traffic
When a DSCP classifier policy is applied to a subinterface, the policy attempts to match the 6-bit DSCP value in the IP header of incoming packets to one of its entries. If there is a match, the incoming packet is assigned to the specified forwarding class and drop probability; otherwise, the assigned forwarding class is 0 and the assigned drop probability is low.
Packets that require a similar treatment (per-hop behavior) are grouped into an FC, also known as a behavior aggregate. The SR Linux differentiates up to eight forwarding classes.
The drop probability can be one of high, medium, or low. If a queue-template with different WRED slopes is bound to a queue, then packets in that queue with a high drop probability are the first to be dropped when the queue experiences congestion, followed by packets with a medium drop probability, then by packets with a low drop probability. The default is low.
Configuring DSCP classifier policies
The following example creates a DSCP classifier policy:
--{ candidate shared default }--[ ]--
# info qos classifiers
qos {
classifiers {
dscp-policy new-policy {
dscp 0 {
forwarding-class fc0
drop-probability high
}
dscp 8 {
forwarding-class fc1
drop-probability high
}
}
}
}
# copy from state /qos classifiers dscp-policy default to /qos classifiers dscp-policy test
Using a DSCP classifier for VXLAN traffic
On a 7720 IXR-D2 and D3, you can use a classifier policy to classify ingress packets received from any remote VXLAN VTEP. The policy applies to payload packets after VXLAN decapsulation is performed.
The following example shows how the DSCP classifier policy created in the previous example (new‑policy) can be used for VXLAN traffic:
--{ candidate shared default }--[ ]--
# info qos classifiers
qos {
classifiers {
vxlan-default new-policy
}
}
DSCP rewrite-rule policy configuration for output traffic
When a DSCP rewrite-rule policy is applied to a subinterface, the policy attempts to match the forwarding class (and optionally the drop-probability) of outbound packets to one of its entries. If there is a match, the DSCP value of the outbound packet is changed to the value specified by the policy. If the forwarding class of the packet does not match a rule of the rewrite-rule policy, the DSCP value is changed to 0.
On 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 systems, if no DSCP rewrite-rule policy is applied to a subinterface, the incoming packet's DSCP remains unchanged at egress.
Configuring DSCP rewrite-rule policies
The following example creates a rewrite-rule policy:
--{ candidate shared default }--[ ]--
# info qos rewrite-rules
qos {
rewrite-rules {
dscp-policy normalize {
map fc0 {
dscp 1
map fc0 {
dscp 7
map fc1 {
dscp 10
drop-probability low {
dscp 11
}
drop-probability high {
dscp 13
map fc2 {
dscp 23
map fc3 {
dscp 31
}
}
}
}
Queue templates configuration
Queue templates are groups of configuration information that apply to a set of queues. On 7250 IXR systems, the controlled set of queues are VOQs; on 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 systems, the controlled set of queues are egress queues.
The maximum number of queue templates per system varies by platform. On 7250 IXR systems, the maximum is eight queue templates; on 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 systems, the maximum is 62 queue templates.
The following parameters are configurable inside a queue template:
The MBS of each queue; this essentially defines the length of each queue. When the queue builds to the MBS level, further packets are dropped. Be aware that discards may occur before the queue reaches MBS (for example, resulting from shared buffer exhaustion, or from the effects of WRED slopes defined for the queue).
WRED slopes that define probability curves for discarding packets as a function of (weighted) average queue depth. WRED slopes are not supported for multicast queues.
ECN slopes that define probability curves for marking ECN-capable packets as having experienced congestion, instead of discarding them. ECN slopes are not supported for multicast queues.
If a queue (VOQ or egress queue) does not have a queue template binding, it inherits the settings of the default queue template. The default queue template has a platform-specific MBS default value, no defined queue utilization thresholds, no WRED slopes, and no ECN slopes. You cannot display the default queue template, but its effect is visible by reading the state of individual queues that lack a queue template binding.
Configuring queue templates
-
a set of VOQs on a 7250 IXR
-
an egress queue on a 7220 IXR-D2, D3, and D5
-
an egress queue on a 7220 IXR-H2 and H3
--{ candidate shared default }--[ ]--
# info qos
qos {
queue-templates {
queue-template wred-ecn-1 {
}
}
}
Queue depth (maximum burst size)
In a queue-template, the maximum-burst-size parameter sets the maximum length of an egress queue or set of VOQs. The queue depth is also known as the Maximum Burst Size (MBS). You must set the maximum-burst-size parameter to a non-zero value to configure WRED slope and ECN slope parameters.
On the 7250 IXR, the maximum-burst-size parameter applies to a set of VOQs. If the parameter is not configured, or is set to 0, the effective MBS of these VOQs is 256MB.
On the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3, the maximum-burst-size parameter applies to a set of egress queues. If the parameter is not configured or is set to 0, the effective MBS of these egress queues is calculated based on a fair allocation algorithm. You can assign a non-zero MBS value to multicast queues, but Nokia does not recommend this configuration (especially if multicast traffic is being shaped by configuring peak-rate-percent), because it can lead to a shortage of multicast-related buffering resources on 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 systems.
Configuring queue depth (maximum burst size)
The following example specifies the queue depth with a set maximum-burst-size:
--{ candidate shared default }--[ ]--
# info qos
qos {
queue-templates {
queue-template wred-ecn-1 {
queue-depth {
maximum-burst-size 20
}
}
}
}
WRED slope
In a queue template, you can configure WRED policies to handle congestion when queue space is depleted. Without WRED, when a queue reaches its maximum fill size, the queue discards any packets arriving at the queue (known as tail drop).
WRED policies manage queue depth. They help to prevent congestion by starting random discards when the queue reaches a user-configured threshold value. This avoids the impact of discarding all the new incoming packets. By starting random discards at this threshold, an end-system can adjust its sending rate to the available bandwidth.
The WRED curve algorithm is based on two user-configurable thresholds (min-threshold-percent and max-threshold-percent) and a discard probability factor (max-probability).
On the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3, you can configure a WRED slope to apply only to TCP or to non-TCP traffic. This can be useful because TCP has built-in mechanisms to adjust its sending rate in response to packet drops. TCP-based senders lower the packet transmission rate when some of the packets fail to reach the far end.
Configuring a WRED slope (7250 IXR)
The following example specifies a WRED slope for low drop probability traffic flowing through a set of VOQs on a 7250 IXR. This WRED slope applies to both TCP and non-TCP traffic.
--{ * candidate shared default }--[ ]--
# info qos
qos {
queue-templates {
queue-template wred-ecn-1 {
active-queue-management {
wred-slope all drop-probability low {
min-threshold-percent 10
max-threshold-percent 25
max-probability 50
}
}
}
}
}
Configuring a WRED slope (7220 IXR)
The following example specifies a WRED slope for TCP traffic that is classified as low drop probability flowing through an egress queue on the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3.
--{ * candidate shared default }--[ ]--
# info qos
qos {
queue-templates {
queue-template wred-ecn-1 {
active-queue-management {
wred-slope tcp drop-probability low {
min-threshold-percent 10
max-threshold-percent 25
max-probability 50
}
}
}
}
}
ECN slope
Some IP applications support the ECN mechanism. With ECN, IP packets originated by such applications are not discarded when they enter a congested queue; instead, they are marked in a special way. The marking uses the two ECN bits in the traffic class field of the IPv4 or IPv6 packet header. The receiver of IP packets marked as having experienced congestion can signal to the sender (through Layer 4 or higher protocols) that it should reduce its sending rate. The advantage of this feedback mechanism is that the sending rate can drop more gradually than the normal response of a TCP sender to packet discards. A more gradual back-off can result in higher effective throughput in the network.
An ECN slope is similar to a WRED slope. It is based on two user-configurable thresholds (min-threshold-percent and max-threshold-percent) and a marking probability factor (max-probability).
To use an ECN slope, you must configure explicit-congestion-notification.
Configuring an ECN slope (7250 IXR)
On 7250 IXR systems, the configuration requires you to specify an ECN DSCP policy; this is the DSCP rewrite policy that is used when an ECN field rewrite must be performed. In addition, you can only have one ECN slope per queue and it applies to all drop-probability levels.
The following example specifies an ECN slope applicable to a 7250 IXR system:
--{ candidate shared default }--[ ]--
# info qos
qos {
explicit-congestion-notification {
ecn-dscp-policy normalize
}
queue-templates{
queue-template wred-ecn-1 {
queue-depth{
maximum-burst-size 20{
}active-queue-management{
ecn-slope{
ecn-drop-probability all{
ecn-min-threshold-percent 50
ecn-max-threshold-percent 50
max-probability 100{
}
Configuring an ECN slope (7220 IXR)
On the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3, you can have one ECN slope per drop-probability level of traffic flowing through an egress queue.
The following example specifies an ECN slope applicable to a 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 system:
--{ candidate shared default }--[ ]--
# info qos
qos {
explicit-congestion-notification {
}
queue-templates {
queue-template 2 {
queue-depth {
maximum-burst-size 100
}
active-queue-management {
ecn-slope high {
min-threshold-percent 0
max-threshold-percent 80
max-probability 90
}
}
}
}
}
Queue utilization thresholds
When a router receives a burst of traffic, and the incoming rate exceeds the available transmission rate, the router queues the excess traffic. If the burst lasts long enough, or it is followed by additional bursts, the queues may overflow, resulting in traffic loss.
To respond to onsets of congestion, you can subscribe to telemetry information that generates an event when specific queues exceed a specified occupancy level.
To assign a utilization threshold to a queue, you must apply a non-default queue template to the queue, and that queue template must specify a non-zero high-threshold-bytes value. When the utilization of the queue crosses the specified high-threshold-bytes value, a hardware interrupt is raised. XDP records the current system-time and clears the interrupt. In a scaled setup, XDP may take 10 to 15 ms to process and clear each interrupt, meaning multiple threshold crossings within a very short period of time across one or more queues using the same queue template may appear as only a single event in the telemetry stream. When the high-threshold-bytes value is 0, the functionality is disabled and no threshold events are generated for the queues covered by the queue template.
SR Linux supports queue utilization thresholds on 7250 IXR, 7220 IXR-D2 and D3, and 7220 IXR-H2 and H3 systems; however, the behavior varies by system.
Configuring queue utilization thresholds on 7250 IXR systems
On a 7250 IXR system, binding a queue template with a non-zero high-threshold-bytes value to an egress queue assigns that threshold value to all the VOQs that logically feed this egress queue.
You can configure each queue template that the system supports with a different high-threshold-bytes value as needed.
Configuring high-threshold-bytes
The following example configures the high-threshold-bytes value to 256255:
--{ candidate shared default }--[ ]--
# qos queue-templates queue-template 2 queue-depth high-threshold-bytes 256255
--{ candidate shared default }--[ ]--
# commit stay
All changes have been committed. Starting new transaction.
Each configured threshold value is rounded up to the nearest multiple of 256 bytes, up to a maximum capped value of MBS. You can observe the rounding (on a per VOQ-set basis) using the info from state interface queue-statistics unicast-queue virtual-output-queue queue-depth output. (A VOQ-set consists of the VOQ for core 0 and the VOQ for core 1.)
Rounding high-threshold-bytes
In the following example, the high-threshold-bytes value was configured to 256255, but is rounded to the lower 256000 value (that is, a multiple of 256 bytes):
--{ candidate shared default }--[ ]--
# info from state interface ethernet-2/1 queue-statistics unicast-queue 0 virtual-output-queue queue-depth
interface ethernet-2/1 {
queue-statistics {
unicast-queue 0 {
virtual-output-queue {
queue-depth {
maximum-burst-size 1203200768
high-threshold-bytes 256000
}
}
}
}
}
The state tree maintains the time of the last threshold crossing in the interface queue-statistics unicast-queue virtual-output-queue queue-depth leaf. This represents the last time when either VOQ in the VOQ-set (core0/core1) exceeded the operational threshold. The value of this leaf is not cleared when you delete or modify the queue template that is bound to the queue/VOQs or the high-threshold-bytes configuration in the applied queue template.
Configuring queue utilization thresholds on 7220 IXR-D2 and D3 systems
On 7220 IXR-D2 and D3 systems, binding a queue template with a non-zero high-threshold-bytes value to an egress queue causes that threshold value to be used for that specific queue, as long as it is a unicast queue. The configuration of this leaf is ignored when this queue template is attached to a multicast queue.
No more than seven different configured high-threshold-bytes values are allowed across all the queue templates used. The management server rejects a commit that would leave more than seven different values after all adds, deletes, and modifies are processed.
Configuring high-threshold-bytes
The following example configures the high-threshold-bytes value to 2048999:
--{ candidate shared default }--[ ]--
A# qos queue-templates queue-template 2 queue-depth maximum-burst-size 2049024
high-threshold-bytes 2048999
--{ candidate shared default }--[ ]--
# commit stay
All changes have been committed. Starting new transaction.
Each configured threshold value (that the management server accepts) is rounded up to the nearest multiple of 2048 bytes, up to a maximum capped value of MBS. For this reason, do not configure values that round to the same multiple of 2048 bytes. This causes duplication among the high-threshold-bytes values, of which only seven are allowed. You can display the effect of this rounding using the info from state interface qos output unicast-queue queue-depth command.
Rounding high-threshold-bytes
In the following example, the high-threshold-bytes value was configured to 2048999, but is rounded to a lower 2048000 value (that is, a multiple of 2048 bytes):
--{ candidate shared default }--[ ]--
A:# info from state interface ethernet-1/3 qos output unicast-queue 0 queue-
depth
interface ethernet-1/3 {
qos {
output {
unicast-queue 0 {
queue-depth {
maximum-burst-size 2049024
high-threshold-bytes 2048000
}
}
}
}
}
The state tree maintains the time of the last threshold crossing in the interface qos output unicast-queue queue-depth last-high-threshold-time leaf. This represents the last time the queue exceeded the operational threshold. The value of this leaf is not cleared when you delete or modify the queue template that is bound to the queue or the high-threshold-bytes configuration in the applied queue-template.
Configuring queue utilization thresholds on 7220 IXR-H2 and H3 systems
On 7220 IXR-H2 and H3 systems, binding a queue template with a non-zero high-threshold-bytes value to an egress queue causes that threshold value to be used by each ITM that serves the queue. For a high-threshold event, the queue utilization threshold must be exceeded on either ITM.
No more than seven different configured high-threshold-bytes values are allowed across all the queue templates used. The management server rejects a commit that would leave more than seven different values after all adds, deletes, and modifies are processed.
Configuring high-threshold-bytes
The following example configures the high-threshold-bytes value to 254255:
--{ candidate shared default }--[ ]--
A# qos queue-templates queue-template 2 queue-depth maximum-burst-size 2049024
high-threshold-bytes 254255
--{ candidate shared default }--[ ]--
# commit stay
All changes have been committed. Starting new transaction.
Each configured threshold value (that the management server accepts) is rounded up to the nearest multiple of 254 bytes, up to a maximum capped value of MBS. For this reason, do not configure values that round to the same multiple of 254 bytes. This causes duplication among the high-threshold-bytes values, of which only seven are allowed. You can display the effect of this rounding using the info from state interface qos output unicast-queue queue-depth command.
Rounding high-threshold-bytes
In the following example, the high-threshold-bytes value was configured to 254255, but is rounded to a lower 254000 value (that is, a multiple of 254 bytes):
--{ candidate shared default }--[ ]-- A:# info from state interface ethernet-1/3 qos output unicast-queue 0 queue- depth interface ethernet-1/3 { qos { output { unicast-queue 0 { queue-depth { maximum-burst-size 2049024 high-threshold-bytes 254000 } } } } }
The state tree maintains the time of the last threshold crossing in the interface qos output unicast-queue queue-depth last-high-threshold-time leaf. This represents the last time when either ITM exceeded the operational threshold. The value of this leaf is not cleared when you modify or delete the queue-template that is bound to the queue or the high-threshold-bytes configuration in the applied queue-template.
DSCP classifier policy application to subinterfaces
If you apply a DSCP classifier policy to input traffic on a subinterface, incoming packets are evaluated against the policy, and matching packets are assigned to the forwarding class and drop probability specified by the policy. If no classifier policy is applied to the subinterface, the system default DSCP classifier (with the reserved name default) is used.
Applying a DSCP classifier policy to input traffic (7250 IXR)
The following example applies a DSCP classifier policy to inbound IPv6 traffic on a subinterface with a 7250 IXR system:
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
subinterface 1 {
qos {
input {
classifiers {
ipv6-dscp new-policy
}
}
}
}
}
Applying a DSCP classifier policy to input traffic (7220 IXR)
The following example applies a DSCP classifier policy to inbound traffic on a subinterface with a 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 system:
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
subinterface 1 {
qos {
input {
classifiers {
dscp new-policy
}
}
}
}
}
Rewrite-rule policy application to subinterfaces
When a rewrite-rule policy is applied to output traffic on a subinterface, outbound packets are evaluated against the policy. The policy subjects all packets to remarking, with some exceptions. If no rewrite-rule policy is applied to the subinterface, the DSCP marking of the traffic leaving the subinterface is unchanged, unless it is ECN-capable traffic forwarded by a 7250 IXR system or VXLAN traffic originated by a 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 system. For these exceptions, DSCP may be remarked even in the absence of a rewrite-rule policy applied to the egress subinterface.
On all platforms, rewrite-rule policies do not affect DSCP marking of self-generated traffic.
Applying a rewrite-rule policy to output traffic (7250 IXR)
The following example applies a rewrite-rule policy to outbound IPv4 traffic on a subinterface with a 7250 IXR system:
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
subinterface 1 {
qos {
output {
rewrite-rules {
ipv4-dscp new-rule
}
}
}
}
}
Applying a rewrite-rule policy to output traffic (7220 IXR)
The following example applies a rewrite-rule policy to outbound traffic on a subinterface with a 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3 system:
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
subinterface 1 {
qos {
output {
rewrite-rules {
dscp new-rule
}
}
}
}
}
Output queue scheduling
Each unicast queue and each multicast queue of an egress port is associated with a scheduler node. The mapping of queues to scheduler nodes is platform-dependent and cannot be configured.
On 7250 IXR systems, there are two scheduling nodes per port; one for unicast traffic and one for multicast traffic. The two scheduling nodes have a WRR relationship, but the parameters cannot be adjusted. There is one PIR scheduling loop per scheduling node. The scheduling loop serves the strict priority classes first (in descending order of FC), followed by the WRR classes (by weight), limiting each forwarding class to its PIR (expressed as a percentage of the egress port bandwidth). By default, the PIR of each forwarding class is 100%. Note that multicast traffic handled by the multicast scheduler node is unscheduled and is not subject to the ingress VOQ buffering that applies to unicast traffic.
On 7220 IXR-D2 and D3 systems, the unicast queue and multicast queue for a particular forwarding class make up a queue pair. Each of the eight possible queue pairs of an egress port are associated with a scheduler node. Each scheduler node is served as strict priority (SP) or weighted round robin (WRR). If it is served as WRR, the scheduler node also has an associated weight. The scheduling loop serves the SP nodes first, followed by the WRR nodes by weight. The serving order of SP queues is in descending order of FC: fc7 first, then fc6, then fc5, and so on.
On 7220 IXR-H2 and H3 and 7220 IXR-D5 systems, there is a one-to-one mapping of queues to scheduler nodes. Each scheduler node can be served as SP or WRR. A WRR node has a configurable weight. The scheduling loop serves the SP nodes first, followed by the WRR nodes by weight. The serving order of SP queues is as follows:
-
unicast queue 7 serving fc7
-
unicast queue 6 serving fc6
-
multicast queue 3 serving fc6 and fc7
-
unicast queue 5 serving fc5
-
unicast queue 4 serving fc4
-
multicast queue 2 serving fc4 and fc5
-
unicast queue 3 serving fc3
-
unicast queue 2 serving fc2
-
multicast queue 1 serving fc2 and fc3
-
unicast queue 1 serving fc1
-
unicast queue 0 serving fc0
-
multicast queue 0 serving fc0 and fc1
Configuring strict priority (7250 IXR)
The following example configures a queue or scheduler node for strict priority. When strict priority is set to false, the associated queue or scheduler node is configured as WRR. When strict priority is set to true, any configured weight is ignored.
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
qos {
output {
unicast-queue 0 {
scheduling {
strict-priority true
}
}
}
}
}
Configuring strict priority (7220 IXR)
The following example configures a queue or scheduler node for strict priority on the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3. Note that when strict priority is set to false, the associated queue or scheduler node is configured as WRR. When strict priority is set to true, any configured weight is ignored.
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
qos {
output {
scheduler {
tier 1 {
node 0 {
strict-priority true
}
}
}
}
}
}
Configuring WRR (7250 IXR)
The following example configures a queue or scheduler-node for WRR. Queues or scheduler nodes that you do not configure with a specific weight have a weight of 1.
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
qos {
output {
unicast-queue 0 {
scheduling {
strict-priority false {
weight 20
}
}
}
}
}
}
Configuring WRR (7220 IXR)
The following example configures a queue or scheduler node for WRR on the 7220 IXR-D2, D3, and D5 or the 7220 IXR-H2 and H3. Queues or scheduler nodes that you do not configure with a specific weight have a weight of 1.
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
qos {
output {
scheduler {
tier 1 {
node 0 {
strict-priority false
weight 20
}
}
}
}
}
}
Configuring forwarding class peak rate
The following example sets the maximum percentage of port bandwidth that is available to traffic of a particular FC. By default, traffic belonging to any FC can use up to 100% of the port bandwidth. The example is applicable to 7250 IXR, 7220 IXR-D2, D3, and D5, and 7220 IXR-H2 and H3 system.
--{ candidate shared default }--[ ]--
# info interface ethernet-1/1
interface ethernet-1/1 {
qos {
output {
unicast-queue 0 {
scheduling {
peak-rate-percent 75
}
}
}
}
}