Quality of service overview

Quality of Service (QoS) provides an appropriate level of service for packets as they flow inside the switch and between switches in the network. The required level of service depends on the application that generates the flow of packets, and can be defined by the application’s sensitivity to packet loss, delay, and jitter.

QoS functionality is supported on the 7250 IXR, 7220 IXR-D2, D3, and D5, and the 7220 IXR-H2 and H3.

Note: The 7220 IXR-D5 supports the following subset of SR Linux QoS functionality:
  • DSCP classifier and rewrite-rule policies (VXLAN not supported)
  • Queue depth (unicast only)
  • WRED slope
  • ECN slope
  • WRR
  • Strict priority scheduling
  • Forwarding class peak rate (unicast only)

You can group packets that require a similar treatment (per-hop behavior) into a Forwarding Class (FC), also known as a behavior aggregate. You can specify up to eight FCs. Traffic is scheduled and can optionally be marked based on its FC.

A configurable drop probability expresses the packet loss sensitivity. Assign a low drop probability to packets that are sensitive to loss. To provide the required congestion management and intelligent discard decisions when congestion occurs, balance the traffic classifications between low, medium, and high drop probability.

How QoS works for transit traffic

This section describes how QoS applies to transit packets on the SR Linux.

  1. Packets are received on a subinterface.

  2. Each received packet is classified as belonging to one of eight forwarding classes (corresponding to forwarding class indexes 0 to 7) and one of three drop probabilities (low, medium, or high).
    • For IP packets:
      • If the packet matches a multifield classifier policy configured on the ingress subinterface, the forwarding class (FC) and drop probability level are determined entirely from that policy. In addition, if this policy includes a DSCP rewrite action, the DSCP value for the packet is rewritten accordingly.

      • Otherwise, if the packet matches a DSCP classifier policy configured on the ingress subinterface, the forwarding class and drop probability level are determined from that policy.

        Note: If there is no entry of this policy matching the received DSCP, the assigned forwarding class index is 0 and the assigned drop probability is low. This FC and drop probability classification corresponds to a best effort treatment.
      • If there is no multifield classifier or DSCP classifier policy bound to the ingress subinterface, the FC and drop probability are determined from the default DSCP classifier policy. See System default DSCP classifier policy.

    • For VLAN-encapsulated, non-IP packets:

      • If the packet matches a dot1p (IEEE 802.1p) classifier policy configured on the VLAN subinterface, the FC and drop probability are determined from that policy.
      • If there is no matching dot1p classifier policy, or no dot1p policy is explicitly bound to the VLAN subinterface, the FC and drop probability are determined from the default dot1p policy.
  3. Both IP and non-IP traffic can be directed to a subinterface traffic policer. In this case, packets are metered to determine compliance with a traffic profile. At the output of the policer, every packet is marked with a color (green, yellow, or red) that represents whether it conforms, exceeds, or violates the traffic profile. The drop probability for all packets can then be updated based on their conformance to the policy, and violating (red) packets can be dropped altogether.
  4. A forwarding lookup on the packet determines its egress port.

  5. On the 7250 IXR, if the packet is a unicast packet, it is associated with a Virtual Output Queue (VOQ) based on the ingress port, egress port, and FC.

    On a 7220 IXR-D2, D3, and D5 or 7220 IXR-H2 and H3, the packet is associated directly with an Egress Queue (EGQ) of the egress port, based on the FC of the packet and its type (either unicast or multicast).

  6. While it waits for its VOQ or EGQ to be serviced, the packet is stored in buffer memory. The total amount of buffer memory varies by platform.

  7. The packet is dropped if the buffer memory is close to full or if the Maximum Burst Size (MBS) of the VOQ or EGQ is exceeded.

    The MBS is one of the parameters that is configurable in a queue template. When a queue template is applied to a set of queues, all of those queues have the MBS value specified in the template. If the MBS is not specified in a queue template, the default value is platform dependent. The MBS is not a guaranteed allocation of buffer memory.

  8. When the packet is Explicit Congestion Notification (ECN)-capable, and ECN is enabled globally with the qos explicit-congestion-notification command, and the VOQ or EGQ has an active ECN slope that applies to the packet, the ECN field may be remarked depending on the current (weighted) queue depth.

    • If the current queue depth is below the configured min-threshold-percent of the ECN slope, the ECN field of the packet is unchanged.

    • If the current queue depth is above the configured max-threshold-percent of the ECN slope, the ECN field of the packet is (re)marked as Congestion Experienced (CE), ECN=11.

    • If the current queue size is between the min-threshold-percent and max-threshold-percent of the ECN slope, the ECN field of the packet is (re)marked as CE, ECN=11, based on a probability function that increases linearly from 0% at the minimum threshold to n% at the maximum threshold, where n is the operational max-probability of marking the packet.

      Note: The operational values of the max-probability may be significantly different from the configured values based on internal hardware calculations. You can check the hardware-configured values for any slope calculations.
  9. When the packet is non-ECN-capable (the ECN field is zero) and the egress queue has an active WRED slope for the drop probability of the packet, the packet may be dropped by the WRED algorithm, which operates as follows:

    • If the current queue depth is below the configured min-threshold-percent of the WRED slope, the packet is admitted to the queue.

    • If the current queue depth is above the configured max-threshold-percent of the WRED slope, the packet is dropped.

    • If the current queue size is between the minimum threshold and maximum threshold of the WRED slope, the packet is dropped based on a probability function that increases linearly from 0% at the minimum threshold to n% at the maximum threshold, where n is the operational max-probability of dropping the packet.
      Note: The operational values of the max-probability may be significantly different from the configured values based on internal hardware calculations. You can check the hardware configured values for any WRED slope calculations.
  10. Each unicast queue and each multicast queue of an egress port is associated with a scheduler node. The mapping of queues to scheduler nodes is platform-dependent and cannot be configured. See Output queue scheduling.

  11. Each egress queue can be individually configured with a Peak Information Rate (PIR). The PIR is configured as a percentage of the egress port bandwidth.

    By default, the PIR of each queue is 100%. The operational PIR is stored by the peak-rate-bps leaf in bits per second. The bits counted in this rate include the Layer 2 framing of the packet (including the 14 byte Ethernet header, the 4-byte VLAN header, and the 4-byte CRC) but exclude the 20-byte Layer 1 overhead (SFD, preamble, IPG).

  12. The DSCP field in the IPv4 or IPv6 header of the outgoing packet can be rewritten. On the 7250 IXR, the DSCP field must be rewritten when ECN is enabled and the packet ECN field is non-zero. When there is a rewrite policy applied, the DSCP in the outgoing packet is based on the FC (and potentially also the drop probability) of the packet. If the FC (and drop-probability) matches an entry in the applied policy, the new DSCP value is based on the policy entry. If there is no matching entry in the applied policy, the new DSCP value is 0.

  13. For VLAN-tagged traffic, the PCP field in the 802.1p header of the outgoing packet can be rewritten. When there is a dot1p marking policy applied to a subinterface, the dot1p value in the outgoing packet is based on the FC (and potentially also the drop probability) of the packet. If the FC (and drop-probability) matches an entry in the applied policy, the new PCP value is based on the policy entry.
    • On a bridged subinterface, if there is no matching entry in the applied policy, all pushed 802.1Q VLAN tags on the outgoing frame are marked with a PCP value of 0.
    • On a routed subinterface, if there is no dot1p policy applied, the forwarding class index from the ingress classification is encoded into the PCP field.

System default DSCP classifier policy

Table 1. System default DSCP classifier policy
DSCP values Included DSCP names Forwarding class Drop probability
0, 2 to 7 CS0/BE fc0 Low
1 LE fc0 High
8 to 11 CS1, AF11 fc1 Low
12 to 13 AF12 fc1 Medium
14 to 15 AF13 fc1 High
16 to 19 CS2, AF21 fc2 Low
20 to 21 AF22 fc2 Medium
22 to 23 AF23 fc2 High
24 to 27 CS3, AF31 fc3 Low
28 to 29 AF32 fc3 Medium
30 to 31 AF33 fc3 High
32 to 35 CS4, AF41 fc4 Low
36 to 37 AF42 fc4 Medium
38 to 39 AF43 fc4 High
40 to 47 CS5, EF fc5 Low
48 to 55 CS6/NC1 fc6 Low
56 to 63 CS7/NC2 fc7 Low

How QoS works for VXLAN traffic

When a 7220 IXR-D2 or D3 receives a terminating VXLAN packet on a subinterface, it classifies the packet to one of eight forwarding classes and one of three drop probabilities (low, medium, or high). The classification is based on the following considerations:

  • The outer IP header DSCP is ignored.

  • If the payload packet is non-IP, the classified FC index is 0 and the classified drop probability is low.

  • If the payload packet is IP, and the qos classifiers vxlan-default command references a classifier policy, that policy is used to determine the FC and drop probability from the header fields of the payload packet.

  • If the payload packet is IP, and the qos classifiers vxlan-default command does not reference a classifier policy, the default DSCP classifier policy is used to determine the FC and drop probability from the header fields of the payload packet.

  • If a dot1p policy is applied on the subinterface, then the PCP field is set to 0. If no dot1p policy is applied, then the FC index value from the ingress classification is encoded into the PCP field.

When the 7220 IXR-D2 or D3 adds VXLAN encapsulation to a packet and forwards it out a subinterface, the inner header IP DSCP value is not modified if the payload packet is IP, even if the egress routed subinterface has a DSCP rewrite rule policy bound to it that matches the packet FC and drop probability. If a DSCP rewrite policy is bound to the egress routed subinterface, that policy modifies the outer header IP DSCP.

Note: If transit VXLAN traffic arrives on a subinterface with a configured subinterface traffic policer, it is policed the same as any other transit traffic. But if the VXLAN traffic terminates on the subinterface, the policing does not apply.

How QoS works for router-terminated traffic

This section describes how QoS applies to traffic that terminates on the SR Linux.

  1. A packet is received on a subinterface and is determined to need extraction toward the CPM. The packet is directed to one of the queues associated with the CPM as a destination ‟physical port” based on its protocol and type. Different traffic types have their own independent queue, for example:

    • sflow

    • ICMPv4 ping

    • BFD

    • ARP

    • ICMPv6 neighbour solicitation and neighbor advertisement

    • BGP

    • gRPC

    • LLDP

    • IPv4 packets with IP options and IPv6 packets with extension headers

    • DHCPv6

    • IS-IS hello PDUs

    • OSPF/OSPFv3 hello PDUs

  2. Some of the queues toward the CPM have a PIR shaping rate designed to prevent an overload of one type of traffic. The PIR shaping rates vary by platform.

How QoS works for router-originated traffic

This section describes how QoS applies to traffic that originates on the SR Linux.

  1. An application on the SR Linux CPM has an IPv4 or IPv6 packet to send to another system.

  2. The CPM datapath assigns a DSCP to the self-generated packet based on its protocol and the hard coded mapping shown in Default forwarding class and DSCP marking for router-originated traffic.

    Except for ICMP and ICMPv6 echo-request packets, the DSCP values cannot be overridden. For originated echo-request packets, the DSCP override value can be configured as an optional parameter of the ping command.

  3. The CPM datapath looks up the DSCP from the previous step (either the fixed value or the override value for echo-request) in the default DSCP classifier policy (see System default DSCP classifier policy) to determine the FC and drop probability level.

  4. A forwarding lookup determines the egress port.

  5. On the 7250 IXR, the packet is sent to the egress line card and added to a Virtual Output Queue (VOQ) appropriate for its forwarding class and the egress port. The decision to drop or enqueue the packet in the VOQ and the scheduling of the VOQ follows the previous description for transit traffic. There is no scheduling differentiation between router-originated traffic and transit traffic of the same FC on the egress IMM.

  6. The packet is directed to the egress queue appropriate for its forwarding class and packet type. On the 7220 IXR-D2, D3, and D5 and the 7220 IXR-H2 and H3, the decision to drop or enqueue the packet in the egress queue and the scheduling of the egress queue follow QoS treatment of transit traffic described in How QoS works for transit traffic.

  7. The DSCP field in the IPv4 or IPv6 header is always written based on the hard coded mapping described in Default forwarding class and DSCP marking for router-originated traffic. If the packet also matches a DSCP policy rewrite rule or a dot1p rewrite rule applied to the output subinterface, the rewrite-rule policy is ignored.

Default forwarding class and DSCP marking for router-originated traffic

Table 2. Default forwarding class and DSCP marking for router-originated traffic
Protocol / message type Forwarding class index Drop probability DSCP marking
IPv4 ARP request/reply 6 Low N/A
ICMPv4 including echo-request1, echo- reply2, dest-unreachable, redirect, time-exceeded, parameter-problem 0 Medium 0
ICMPv4 echo-request with ToS/DSCP override = x look up X in system-default DSCP classifier look up X in system-default DSCP classifier x
ICMPv4 echo-reply to echo-request with non-zero DSCP x look up X in system-default DSCP classifier look up X in system-default DSCP classifier x
UDP traceroute 0 Low 0
IPv6 neighbor solicitation 6 Low 48 (CS6/NC1)
IPv6 neighbor advertisement 6 Low 48 (CS6/NC1)
All other ICMPv6 including dest unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, echo-reply, router-solicitation, redirect 0 Medium 0
ICMPv6 echo-request with DSCP override = x look up x in system-default DSCP classifier look up x in system-default DSCP classifier x
ICMPv6 echo-reply to echo-request with non-zero DSCP x look up x in system-default DSCP classifier look up x in system-default DSCP classifier x
BFD 6 Low 48 (CS6/NC1)
BGP 6 Low 48 (CS6/NC1)
DNS query 4 Low 34 (AF41)
FTP/TFTP 4 Low 34 (AF41)
gNMI 4 Low 34 (AF41)
JSON RPC 4 Low 34 (AF41)
LLDP N/A Low N/A
NTP 4 Low 34 (AF41)
sFlow 0 Low 0
SNMP 4 Low 34 (AF41)
SSH 4 Low 34 (AF41)
Syslog 4 Low 34 (AF41)
TACACS+ 4 Low 34 (AF41)
1 Echo-request generated by a ping command with no DSCP parameter specified.
2 Echo-reply to an echo-request packet with DSCP=0.