Geo-redundancy operations
Synchronizing (sync)
- Data that can be relearned or regenerated within a reasonable time (minutes) if it was not fully synced.
- Software images are not synced because they are large files.
If a standby cluster becomes the new active cluster, it uploads images from the local image source. Software images are not transferred from an old active cluster to new active cluster.
- Performance metrics or platform health metrics (Prometheus data) are not synchronized.
The options to restart and stop synchronizing are available only on the active site. When the active system stops synchronizing, it goes into the Sync Stopped state and becomes read-only. When the standby system stops synchronizing, it goes to a Sync Aborted state.
Audits
If the sync connection is down and the active system is reachable and is operational, you can perform sync and reconcile operations to retain the active site. If you want to failover to the standby system and make it active, you must run an audit on a standby system before you can make it active.
During an audit, each service in the cluster verifies its own data, in sequence, to ensure that no data in its database is in an inconsistent state. An audit is not a verification of data between the clusters in a geo-redundant system. An audit is performed on the standby system. The system blocks any changes to system configuration while an audit is in progress.
- initiate failover: make the original standby system active and the original
active system the standby
For instructions, see Initiating failover: switching between the active and standby clusters.
- initiate standalone operations for the standby system
For instructions, see Converting a geo-redundant system to a standalone system
You can generate an audit report to display, for each app, what that app tried to do to correct the inconsistencies of the respective data collections. If the data is already consistent, the report may not contain much information.
In the rare event of an audit failure (that is, the state moves to Audit Fail), you can recover by reinstalling the configuration from backup. For instructions, see Backup and restore.
Reconciling
The reconcile operation initiates the replication of the data set from the active to the standby cluster. This action replaces the data set in the standby system with the data set from the active system.
- When the sync is first established (such as during the initial geo-redundancy configuration
- When the sync connection recovers after an unintentional disruption in the sync connection
-
This operation is available from the active system.
-
The sync connection must be active before you can initiate the reconcile operation.
-
There should be no pending workload jobs, deployments or any operations which could potentially modify the database in the background.
REST API geo-redundancy operations
The operations allowed on the REST API varies depending on current geo-redundant status of the site.
State | Active site | Standby site |
---|---|---|
STANDALONE | Read/Write | Read/Write |
ACTIVE_SYNCING | Read/Write | Not applicable |
SYNC_STOPPED | Read/Write | Not applicable |
SYNC_ABORTED | Read-only | Read-only |
AUDIT | Not applicable | Read-only |
AUDIT_DONE | Not applicable | Read-only |
AUDIT_FAILED | Not applicable | Read-only |
STANBY_SYNCING | Not applicable | Read-only |
RECONCILE | Read-only | Read-only |