Geo-redundancy

Geo-redundancy is the practice of replicating data and applications across multiple geographic locations. If the system fails in one location, a system in another location can continue to provide services without interruption.

The Fabric Services System is deployed to manage one or more data centers, which can require changes in the fabric on short notice or be dynamically integrated with OSS and CMS platforms. In such an environment, the availability of the Fabric Services System is crucial; if the system is unavailable, no changes can be made to the fabric and the network and applications are at risk. Typically, failures are not within the application or single rack infrastructure, but rather from power outages or network outages in the data center where the system is deployed. For this reason, disaster recovery plans are in place so infrastructure and applications can recover quickly.

As part of disaster recovery plans, the Fabric Services System supports geo-redundancy, in which a backup deployment of Fabric Services System is dormant and waiting to be activated in case a disaster happens on the active site. This standby site has the necessary data and synchronizes all data from the active site that is needed to quickly recover fabric management functionality if so instructed.

In the Fabric Services System, geo-redundancy is configured between two independent clusters. One system is configured as the active cluster and the other the standby.
Figure 1. Geo-redundancy in the Fabric Services System

After geo-redundancy has been configured on both the active and standby instances and the sync connection is active, changes to the active instance are replicated on the standby instance. A health check between the active and standby instances detects anomalies and, if present, generates alarms.

The active site manages the fabric and accepts API and configuration changes. The active site ensures that the standby site is in sync with the active site and pushes all required data to the standby site.

The standby site has continuously synchronizes all data from the active site so that the standby site can quickly recover fabric management functionality if so instructed.

The standby site does not actively manage or monitor any fabric; it operates in read-only mode. The UI loads with the Geo-Redundancy page as its home page and only the Alarms and Geo-Redundancy pages are available on the main menu. The API is disabled, except for GET calls for alarms and geo-redundancy.

You can configure geo-redundancy using the UI or REST API.