DCBX
Data Center Bridging eXchange (DCBX) protocol is a discovery and exchange protocol for advertising configurations and capabilities between directly connected peers. In addition to propagating configurations, DCBX also allows for the detection of misconfigurations between peers.
DCBX is defined by Section 38 of the IEEE 802.1Q-2022 specification. The DCBX protocol information is propagated by LLDP using TLVs as defined in Annexes D.2.8, D.2.9, and D.2.10 of the 802.1Q specification.
Because LLDP is a unidirectional protocol, each node sends its local configuration to its neighbor, and the remote neighbor's state machine determines how to process and apply the received information.
DCBX can support the exchange of PFC and ETS information.
- PFC configuration TLV (0x0b)
- ETS recommendation TLV (0x0a)
- ETS configuration TLV (0x09)
DCBX operation
In SR Linux, DCBX is enabled on every interface by default. The DCBX TLV for PFC propagates the status for each PFC priority (0 to 7) to the peer as follows:
- If the interface has one or more PFC priorities enabled, DCBX advertises the per-priority status (enabled or disabled) for all PFC priorities.
- If the interface has no PFC priorities enabled, DCBX advertises all PFC priorities as disabled.
In state, the system maintains information about the operational state of DCBX for the local node and for the remote peer. With SR Linux, all DCBX-enabled interfaces are effectively in unwilling state, which means the system never reacts to the state received from the remote peer. Instead, the system maintains the local and remote state to be available for display. Any discrepancy identified between the local and remote state allows for the detection of misconfigurations. You can then update the configuration as required to address the discrepancy.
Suppress DCBX TLVs
LLDP configuration allows you to suppress advertisement of any of the supported DCBX TLVs. For more information, see the LLDP content in the SR Linux Interfaces Guide.
ETS TLVs
SR Linux supports both types of ETS TLVs, as described in Annex D2.8 and D2.9 of the IEEE 802.1Q-2022 specification:
- ETS configuration TLV — advertises local output scheduler configuration to peers
- ETS recommendation TLV — advertises recommendations to peers for output scheduler configuration
SR Linux always sends and receives both configuration and recommendation TLVs, and the system state displays both sent and received TLVs.
If a system is configured as willing and it receives a recommendation TLV, the system can change its scheduler configuration to match the recommended settings. But if the system is configured as unwilling, it does not change its configuration. In SR Linux, all DCBX-enabled interfaces are in unwilling state, which means the system never acts on received recommendation TLVs, other than displaying the remote state information contained in the recommendation TLV.
The configuration TLV in transmitted messages is constructed based on the scheduler policy configuration under qos scheduler-policies scheduler-policy that is attached to the given interface. If there is no scheduler policy attached, the TLV is based on the default scheduler policy. Similarly, if any queues are not defined in a scheduler policy, the queue values from the default scheduler policy are used.
SR Linux transmits the recommendation and configuration TLVs with the same values, and therefore makes no distinction between the two TLVs in local state.
Traffic-class-to-priority mapping
In SR Linux, the traffic class (TC) index is defined by the forwarding class index, and each forwarding class is associated with an output queue. The traffic-class-to-priority mapping for the scheduler is defined per queue in the QoS scheduler policy.
The max TCs field in the configuration TLV is always set to 0 to advertise eight TCs (forwarding classes) whether or not all eight queues are explicitly configured in the scheduler policy.
Transmission Selection Algorithm (TSA) assignment
ETS TLVs specify bandwidth allocations for WRR traffic classes. Within the TLV, the TSA assignment entry allows a remote system to identify strict-priority queues and non-strict-priority queues. Each TC is assigned a value as follows:
- queues attached to the strict-priority scheduler are set to 0 to indicate that they do not participate in the allocation of bandwith percentages
- queues attached to the non-strict-priority WRR scheduler are set to 2 to indicate that they do participate in the allocation of bandwidth as a percentage of total interface bandwidth
Traffic class bandwidth assignment (as percentages) with PIR
The ETS TLV advertises the bandwidth allocated to each traffic class as a percentage of the total interface bandwidth. The sum of all percentages must add up to 100.
However, the distribution of bandwidth can be influenced by the PIR when it is also assigned to individual queues. If the PIR is set to a value below the bandwidth that the traffic class would otherwise receive based solely on weight allocation, the queue is allocated the lower PIR value, and the remaining bandwidth becomes available for other traffic classes.
The following table shows an example of how the PIR can affect the calculation of allocated bandwidth percentages:
| Queue | Weight | Peak-rate-percent | Bandwidth= min(100*(weight/total weight), PIR) | Description |
|---|---|---|---|---|
| q0 | 2 | 15 |
min(100*(2/10), 15) = 15% |
With an assigned weight of 2 out of a total weight of 10 for all queues (2 for q0, 2 for q1, and 1 each for q2 to q7), q0 can be allocated 20% of the total available bandwidth percentage (100%). But given its lower assigned PIR value of 15, the allocation is instead set to 15%, and the remaining 5% of bandwidth is then redistributed to the other queues. |
| q1 | 2 | 18 |
min(85*(2/8), 18) = 18% |
With an assigned weight of 2 out of a total weight of 8 for the remaining queues (2 for q1 and 1 each for q2 to q7), q1 can be allocated 21.25% of the remaining available bandwidth percentage (100% - 15% = 85%, which includes the excess from q0). But again, given its lower assigned PIR value of 18, the allocation is instead set instead 18%. The remaining 3.25% of bandwidth is then redistributed to the remaining queues. |
| q{2..7} | 1 | 100 |
min(67*(1/6), 100) = 13% |
With a weight of 1 out of a total weight of 6 for the remaining queues (1 each for q2 to q7), each of the six queues are allocated 13.33% of the remaining available bandwidth (100% - 15% - 18% = 67%). However, given that the TLV publishes traffic class bandwidth down to a granularity of 1%, 13.33% is advertised as 13% for q2 to q7. |
Under-utilized bandwidth
The PIR configuration can sometimes lead to under-utilized interface throughput, specifically, if the sum of all PIRs is less than the interface speed. In this case, the sum of all ETS traffic classes does not add up to 100%. To comply with the specification in this case, the system redistributes the available bandwidth among all configured WRR queues.
Interoperability with LAG
Although the scheduling-related states are published at the LAG level, for the purpose of DCBX ETS, the TLVs must be sent at the individual LAG-member level. In this case, the same value of ETS TLV is assigned to each LAG member.
ETS TLV configuration example
To illustrate the ETS TLV behavior, consider the following QoS scheduler policy configuration example.
--{ candidate shared default }--[ ]--
# info with-context qos queues
qos {
queues {
queue af2 {
queue-index 2
}
queue best-effort {
queue-index 0
}
queue expedited {
queue-index 6
}
queue nc {
queue-index 7
}
}
}
--{ candidate shared default }--[ ]--
# info with-context qos forwarding-classes
qos {
forwarding-classes {
forwarding-class af2 {
output {
unicast-queue af2
}
}
forwarding-class be {
output {
unicast-queue best-effort
}
}
forwarding-class ef {
output {
unicast-queue expedited
}
}
forwarding-class nc {
output {
unicast-queue nc
}
}
}
}
--{ candidate shared default }--[ ]--
# info with-context qos scheduler-policies scheduler-policy policy-name
qos {
scheduler-policies {
scheduler-policy policy-name {
scheduler 0 {
priority strict
input 1 {
input-type queue
queue-name nc
peak-rate-percent 4
}
input 2 {
input-type queue
queue-name expedited
peak-rate-percent 8
}
}
scheduler 1 {
input 1 {
input-type queue
queue-name af2
peak-rate-percent 50
}
input 2 {
input-type queue
queue-name best-effort
peak-rate-percent 50
}
}
}
}
}
ETS configuration TLV example output
The following output show the ETS TLV state that is shown based on the preceding configuration applied to an interface.
--{ state }--[ ]--
# info from state qos interfaces interface eth-2/1 dcbx ets-tlv
remote-credit-based-shaper 0
local-credit-based-shaper 0
remote-maximum-traffic-classes 8
local-maximum-traffic-classes 8
priority 0 {
remote-priority-assignment-configuration 0
remote-priority-assignement-recommendation 0
local-priority-assignment 0
}
priority 1 {
remote-priority-assignment-configuration 1
remote-priority-assignement-recommendation 1
local-priority-assignment 1
}
priority 2 {
remote-priority-assignment-configuration 2
remote-priority-assignement-recommendation 2
local-priority-assignment 2
}
priority 3 {
remote-priority-assignment-configuration 3
remote-priority-assignement-recommendation 3
local-priority-assignment 3
}
priority 4 {
remote-priority-assignment-configuration 4
remote-priority-assignement-recommendation 4
local-priority-assignment 4
}
priority 5 {
remote-priority-assignment-configuration 5
remote-priority-assignement-recommendation 5
local-priority-assignment 5
}
priority 6 {
remote-priority-assignment-configuration 6
remote-priority-assignement-recommendation 6
local-priority-assignment 6
}
priority 7 {
remote-priority-assignment-configuration 7
remote-priority-assignement-recommendation 7
local-priority-assignment 7
}
traffic-class 0 {
remote-traffic-class-bandwidth-configuration 50
remote-traffic-class-bandwidth-recommendation 50
local-traffic-class-bandwidth-assignment 50
remote-tsa-value-configuration 2
remote-tsa-value-recommendation 2
local-tsa-value 2
}
traffic-class 1 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
traffic-class 2 {
remote-traffic-class-bandwidth-configuration 50
remote-traffic-class-bandwidth-recommendation 50
local-traffic-class-bandwidth-assignment 50
remote-tsa-value-configuration 2
remote-tsa-value-recommendation 2
local-tsa-value 2
}
traffic-class 3 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
traffic-class 4 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
traffic-class 5 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
traffic-class 6 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
traffic-class 7 {
remote-traffic-class-bandwidth-configuration 0
remote-traffic-class-bandwidth-recommendation 0
local-traffic-class-bandwidth-assignment 0
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 0
}
Configuring DCBX
By default, DCBX is enabled on every interface. It is possible to disable it administratively under the qos interfaces interface dcbx admin-state context. If disabled, LLDP stops advertising the DCBX capability.
To configure DCBX on an interface, use the dcbx admin-state command in the qos interfaces interface context.
Configuring DCBX
The following example enables DCBX on interface eth-1/4.
--{ candidate shared default }--[ ]--
# info with-context qos interfaces interface eth-1/4
qos {
interfaces {
interface eth-1/4 {
interface-ref {
interface ethernet-1/4
}
dcbx {
admin-state enable
}
}
}
}
The DCBX configuration is at interface level only. There is no system-level admin-state setting for DCBX.
Displaying DCBX state information
To display the local and remote operational state for DCBX, use the info from state command.
--{ state }--[ ]--
# info from state qos interfaces interface eth-1/4 dcbx
admin-state disable
oper-state down
oper-state-reason dcbx-admin-disabled
pfc-priority 0 {
oper-state down
remote-state remote-down
}
pfc-priority 1 {
oper-state down
remote-state remote-down
}
...
pfc-priority 7 {
oper-state down
remote-state remote-down
}
--{ state }--[ ]--
# info from state qos interfaces interface ethernet-2/3 dcbx
admin-state enable
oper-state down
oper-state-reason remote-dcbx-down
pfc-priority 0 {
oper-state up
remote-state remote-down
}
...
pfc-priority 7 {
oper-state up
remote-state remote-down
}
--{ state }--[ ]--
# info from state qos interfaces interface ethernet-1/3 dcbx
admin-state enable
oper-state down
oper-state-reason lldp-oper-state-down
pfc-priority 0 {
oper-state up
remote-state remote-down
}
...
pfc-priority 7 {
oper-state up
remote-state remote-down
}
--{ state }--[ ]--
# info from state qos interfaces interface ethernet-1/6 dcbx
admin-state enable
oper-state up
pfc-priority 0 {
oper-state down
remote-state remote-down
}
pfc-priority 1 {
oper-state down
remote-state remote-down
}
...
pfc-priority 7 {
oper-state down
remote-state remote-down
}
ets-tlv {
remote-credit-based-shaper 0
local-credit-based-shaper 0
remote-maximum-traffic-classes 8
local-maximum-traffic-classes 8
priority 0 {
remote-priority-assignment-configuration 0
remote-priority-assignment-recommendation 0
local-priority-assignment 0
}
priority 1 {
remote-priority-assignment-configuration 1
remote-priority-assignment-recommendation 1
local-priority-assignment 1
}
priority 2 {
remote-priority-assignment-configuration 2
remote-priority-assignment-recommendation 2
local-priority-assignment 2
}
...
priority 7 {
remote-priority-assignment-configuration 7
remote-priority-assignment-recommendation 7
local-priority-assignment 7
}
traffic-class 0 {
remote-traffic-class-bandwidth-assignment-configuration 0
remote-traffic-class-bandwidth-assignment-recommendation 0
local-traffic-class-bandwidth-assignment 27
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 2
}
traffic-class 1 {
remote-traffic-class-bandwidth-assignment-configuration 0
remote-traffic-class-bandwidth-assignment-recommendation 0
local-traffic-class-bandwidth-assignment 10
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 2
}
traffic-class 2 {
remote-traffic-class-bandwidth-assignment-configuration 0
remote-traffic-class-bandwidth-assignment-recommendation 0
local-traffic-class-bandwidth-assignment 18
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 2
}
traffic-class 3 {
remote-traffic-class-bandwidth-assignment-configuration 0
remote-traffic-class-bandwidth-assignment-recommendation 0
local-traffic-class-bandwidth-assignment 10
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 2
}
...
traffic-class 7 {
remote-traffic-class-bandwidth-assignment-configuration 0
remote-traffic-class-bandwidth-assignment-recommendation 0
local-traffic-class-bandwidth-assignment 3
remote-tsa-value-configuration 0
remote-tsa-value-recommendation 0
local-tsa-value 2
}
}
If an interface has DCBX enabled, but it does not receive a DCBX capability message from the peer (DCBX is disabled on the remote node), the DCBX state is shown as oper-down, with a reason code of remote-dcbx-down.