Interaction with headless mode

If the PFCP path between the BNG-UP and the MAG-c fails and the BNG-UP becomes headless, use the following command to determine the behavior for active FSGs.

configure subscriber-mgmt up-resiliency fate-sharing-group-template path-restoration-state

When using the standby option for this command, the BNG-UP forces all active FSGs to become standby. This avoids any possibility of active/active UP behavior. However, if all the BNG-UPs become headless, for example, because of a routing issue to the MAG-c, all FSGs on all BNG-UPs become standby, and no forwarding is possible.

Note: Nokia recommends leaving this command set to the default (auto). Only enable standby if the network cannot handle the described active/active scenarios and avoidance.

If you use the auto option, the BNG-UP uses an heuristic process to decide whether to keep the FSG as active or move it to standby. The BNG-UP autonomously changes FSGs to standby if any of the following conditions are met:

No single network instance is monitored for health.
At least one of the monitored network instances indicates health failure.
No GARP messages are snooped from another BNG-UP.

This process detects the difference between an isolated BNG-UP becoming headless, or all BNG-UPs becoming headless. When the BNG-UP estimates that all BNG-UPs are headless, it keeps the FSGs active. Alternatively, the BNG-UP keeps FSGs as standby, because the MAG-c activates another BNG-UP.

The heuristic check of network health determines if the failure is a more generic network failure, which is more likely to be BNG-UP local (for example, a network link failure). If the PFCP fails but the local network is fine, the failure is probably central, and all BNG-UPs became headless. In addition, if the network link is down the system probably cannot forward the session traffic anyway.

The heuristic check of GARP snooping is used to determine if another BNG-UP became active while this UP is headless. If another BNG-UP sends a GARP message this means it was updated by the MAG-c to become active, which in turns means it cannot be headless. Because of this it can be estimated that the headless mode is contained to a single node and it is safe to become standby.

These heuristic checks are best-effort and may fail to detect active/active conditions. However, by correctly setting routing metrics to differentiate between the fsg-active and fsg-active-path-restoration options, you can avoid the worst of active/active scenarios. By giving the headless BNG-UP a worse metric or preference, only the non-headless active BNG-UP draws downlink traffic. The headless BNG-UP may still erroneously answer ARP requests and update forwarding databases in the access node for the FSG virtual MAC. However, typically this is quickly corrected by downlink traffic using the vMAC as the source MAC address, coming from the non-headless BNG-UP.

The following are other risks of active/active scenarios:

When the aggregation network replicates unknown unicast packets to both BNG-UPs, it forwards these packets twice, leading to duplicate packets in the network. However, it is unlikely that the FSG virtual MAC is ever an unknown MAC. To further reduce the risk, disable unicast replication.
Exact QoS enforcement is not guaranteed in this situation.

The standby FSGs remain as-is during headless conditions. Whichever option you use (standby or auto), the BNG-UP reverts the FSG state to active after the headless condition is no longer valid.