Geo-redundancy configuration
Deployment considerations
- Geo-redundancy works for one, three, or six node deployments, but both clusters must have the same type of node deployment.
- The active and standby deployments must have the same number of nodes and the same resource configuration.
- The active and standby deployments must have the same version installed.
- Signing certificates must be aligned.
For instructions, see Realigning certificates.
Networking considerations
Geo-redundancy has a few requirements and considerations from a networking perspective and connectivity between the active and standby site:
- Synchronization uses the API service and should use the OAM network. When configuring geo-redundancy, make sure to use the FQDN or VIP of the other cluster on the OAM network.
- Connectivity between the active and standby cluster can be through a stretched L2 subnet between the sites, or routed with two different L2 subnets.
- The active and standby site must use different IP and VIP addresses.
- Synchronisation is supported over IPv4 and IPv6.
- The maximum allowed RTT latency between the active and standby sites is 100ms, but a maximum of 50ms is highly recommended. The lower the RTT latency, the better.
- The connection speed between the active and standby site must be a minimum of 1Gbps.
ZTP and DHCP handling after active failure
When the active site fails and the standby site has not been activated yet, the DHCP and ZTP capabilities of the platform are unavailable. At that time, SR Linux nodes cannot be rebooted, bootstrapped, or upgraded.
After the standby site has been made active, the standby site now runs the DHCP service and supports the ZTP process of SR Linux nodes.
Geo-redundancy configuration tasks
- Deploy the deployer VM on the active and standby sites.
For instructions, see
.The Fabric Services System deployer VM
in the Fabric Services System Installation Guide. - Deploy the Fabric Services System on the active and the standby sites, using
the installation procedures provided in the Fabric Services System
Installation Guide.Note: You can also upgrade an existing standalone deployment first, then set up the standby site.
- Configuring geo-redundancy information in deployer VMs
- Verifying that the setup is ready for geo-redundancy using the deployer VMs
- Realigning certificates
- Configuring geo-redundancy
Configuring geo-redundancy information in deployer VMs
Use this procedure to configure the deployer VMs with the remote site details. The steps in this procedure help you view the status of both the local and remote Fabric Services System clusters and determine whether both the sites are configured to allow for geo-redundancy to work correctly.
This section is optional as it does not affect the actual geo-redundancy functionality of the platform. Configuring the deployer VMs on the active and standby site to know about each other, does help in potential troubleshooting and inspecting the infrastructure for discrepancies.
-
Configure passwordless SSH access locally on both the active and standby
deployer VMs.
Enter the following command on both the active and standby deployer VMs:
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys -
Configure passwordless SSH access from the local deployer VM to the remote
deployer VM and vice-versa.
Copy the contents of the /root/.ssh/id_rsa.pub file on the remote deployer and update the /root/.ssh/authorized_keys file of the local deployer.
-
Add the necessary details from the remote site by copying the
input.json file from the remote site.
Enter the following command:
fss-install.sh add-remote-deployer <input_json_of_remote_deployer>
Note: For this command to work, on the input.json file of the active site, thedeployernode.role
field must be set toactive
and on the input.json file of the standby site, thedeployernode.role
field must be set tostandby
. This setting is needed so each deployer VM is aware which site is considered active and which is standby by default. - Repeat step 3 from the remote deployer.
- Verify the configuration by displaying the contents of the sites.json file on both deployer VMs.
[root@fss-deployersite01 ~]# cat /var/lib/fss/sites/sites.json
{
"local": {
"name": "site01",
"ipv4": "10.x.x.1",
"ipv6": "",
"accessip": "10.x.x.1",
"role": "active"
},
"remote": [
{
"name": "site02",
"ipv4": "10.x.x.11",
"ipv6": "",
"accessip": "10.x.x.11",
"role": "standby"
}
]
}
Verifying that the setup is ready for geo-redundancy using the deployer VMs
The deployer VM provides the following tools that you can use to verify and display information about geo-redundancy configuration and status:
- cat /var/lib/fss/sites/sites.json: displays the local and remote clusters, the access IP address and IP addresses, and role of each cluster
- /root/bin/fss-install.sh status-georedundancy: displays basic geo-redundancy status
- /root/bin/fss-install.sh status-georedundancy -v: displays detailed geo-redundancy status
- /root/bin/fss-install.sh status-georedundancy -t site01: displays the details of the Fabric Services System cluster, certificates, and applications
Status of geo-redundancy, basic output
In the output, Active(fd56:1:91:2::21) vs Standby(fd56:1:91:2::6a)
reports
the IP addresses used to connect to the active and standby sites.
[root@fss-deployersite01 ~]# /root/bin/fss-install.sh status-georedundancy
=====================================================
Sites Overview
=====================================================
+--------------+---------+--------+-------------+
| NAME | ROLE | STATUS | CONSISTENCY |
+--------------+---------+--------+-------------+
| site01(self) | active | GOOD | N/A |
| site02 | standby | GOOD | ERROR |
+--------------+---------+--------+-------------+
=====================================================
Active(fd56:1:91:2::21) vs Standby(fd56:1:91:2::6a)
=====================================================
+--------------+----------+
| NAME | STATUS |
+--------------+----------+
| NODES | GOOD |
| PASSWORDS | GOOD |
| CERTIFICATES | MISMATCH |
| VERSION | GOOD |
+--------------+----------+
[root@fss-deployersite01 ~]# /root/bin/fss-install.sh status-georedundancy -o yaml
Overview:
- NAME: site01(self)
ROLE: active
STATUS: GOOD
CONSISTENCY: N/A
- NAME: site02
ROLE: standby
STATUS: GOOD
CONSISTENCY: ERROR
standby-site02:
- NAME: NODES
STATUS: GOOD
- NAME: PASSWORDS
STATUS: GOOD
- NAME: CERTIFICATES
STATUS: MISMATCH
- NAME: VERSION
STATUS: GOOD
[root@fss-deployersite01 ~]#
If the CONSISTENCY column reports an error, use the -v option with the /root/bin/fss-install.sh status-georedundancy command to display more information. To display detailed information about a particular site, use the fss-install.sh status-georedundancy -t <site name> command.
Detailed geo-redundancy information
Use the /root/bin/fss-install.sh status-georedundancy -v command to display details about consistency errors. In the output, the Sites Overview section shows a consistency error. The subsequent sections indicate the area with the consistency error. In the example below, the details about the error are shown in Details about CERTIFICATES section. The serial number mismatch is not a severe issue, but the different node CAs between in site01 and site02 should be addressed.
[root@fss-deployersite01 ~]# /root/bin/fss-install.sh status-georedundancy -v
=====================================================
Sites Overview
=====================================================
+--------------+---------+--------+-------------+
| NAME | ROLE | STATUS | CONSISTENCY |
+--------------+---------+--------+-------------+
| site01(self) | active | GOOD | N/A |
| site02 | standby | GOOD | ERROR |
+--------------+---------+--------+-------------+
=====================================================
Active(fd56:1:91:2::21) vs Standby(fd56:1:91:2::6a)
=====================================================
-----------------------------------------------------
Details about fss VERSION
-----------------------------------------------------
+-------------------+--------------+--------+
| NAME | CHARTVERSION | STATUS |
+-------------------+--------------+--------+
| cert-manager | GOOD | GOOD |
| fss-logs | GOOD | GOOD |
| kafka | GOOD | GOOD |
| kafkaop | GOOD | GOOD |
| metallb | GOOD | GOOD |
| prod | GOOD | GOOD |
| rook-ceph | GOOD | GOOD |
| rook-ceph-cluster | GOOD | GOOD |
| traefik | GOOD | GOOD |
+-------------------+--------------+--------+
-----------------------------------------------------
Details about CERTIFICATES
-----------------------------------------------------
+--------------+--------+---------+---------------+---------+
| CERTSOURCE | ISSUER | SUBJECT | SERIAL-NUMBER | VALIDTO |
+--------------+--------+---------+---------------+---------+
| fss gui/rest | GOOD | GOOD | MISMATCH | GOOD |
| kafka | GOOD | GOOD | MISMATCH | GOOD |
| node CA | ERROR | ERROR | ERROR | ERROR |
+--------------+--------+---------+---------------+---------+
-----------------------------------------------------
Details about NODES
-----------------------------------------------------
+------------+-------------+
| NAME | CONSISTENCY |
+------------+-------------+
| master_cnt | GOOD |
| total_cnt | GOOD |
+------------+-------------+
-----------------------------------------------------
Details about PASSWORDS
-----------------------------------------------------
+------------+-----------------+-------------+
| APP | USER | CONSISTENCY |
+------------+-----------------+-------------+
| mongodb | root | GOOD |
| mongodb | fsp_user | GOOD |
| neo4j | root | GOOD |
| keycloak | master | GOOD |
| keycloak | fss | GOOD |
| keycloak | ztp | GOOD |
| postgresql | root | GOOD |
| postgresql | keycloak | GOOD |
| kafka | fss-kafka-admin | GOOD |
+------------+-----------------+-------------+
You can also display the output in YAML
format:[root@fss-deployersite01 ~]# /root/bin/fss_geo_redundancy.py status -v -o yaml
Overview:
- NAME: site01(self)
ROLE: active
STATUS: GOOD
CONSISTENCY: N/A
- NAME: site02
ROLE: standby
STATUS: GOOD
CONSISTENCY: ERROR
standby-site02:
HELMVERSION:
- NAME: cert-manager
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: fss-logs
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: kafka
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: kafkaop
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: metallb
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: prod
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: rook-ceph
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: rook-ceph-cluster
CHARTVERSION: GOOD
STATUS: GOOD
- NAME: traefik
CHARTVERSION: GOOD
STATUS: GOOD
CERTIFICATES:
- CERTSOURCE: fss gui/rest
ISSUER: GOOD
SUBJECT: GOOD
SERIAL-NUMBER: MISMATCH
VALIDTO: GOOD
- CERTSOURCE: kafka
ISSUER: GOOD
SUBJECT: GOOD
SERIAL-NUMBER: MISMATCH
VALIDTO: GOOD
- CERTSOURCE: node CA
ISSUER: ERROR
SUBJECT: ERROR
SERIAL-NUMBER: ERROR
VALIDTO: ERROR
NODES:
- NAME: master_cnt
CONSISTENCY: GOOD
- NAME: total_cnt
CONSISTENCY: GOOD
PASSWORDS:
- APP: mongodb
USER: root
CONSISTENCY: GOOD
- APP: mongodb
USER: fsp_user
CONSISTENCY: GOOD
- APP: neo4j
USER: root
CONSISTENCY: GOOD
- APP: keycloak
USER: master
CONSISTENCY: GOOD
- APP: keycloak
USER: fss
CONSISTENCY: GOOD
- APP: keycloak
USER: ztp
CONSISTENCY: GOOD
- APP: postgresql
USER: root
CONSISTENCY: GOOD
- APP: postgresql
USER: keycloak
CONSISTENCY: GOOD
- APP: kafka
USER: fss-kafka-admin
CONSISTENCY: GOOD
Realigning certificates
Use the fss-certificate.sh export [-d directory] utility to export the signing certificate files installed in the intended active system to a local directory. If you do not specify a directory, the certificates are exported to the local /root/userdata/certificates directory. Then, copy the needed certificate files to the intended standby and deploy them.
-
From the deployer of the intended active system, execute the
fss-certificate.sh export command.
The following output shows that default signing certificates are present.
[root@fss-deployer ~]# /root/bin/fss-certificate.sh export Certificates will be exported to /root/userdata/certificates Default install generated signing certificates are in use for generating node certificates Default install generated signing certificates are in use for nbi/gui/kafka Server Certificates are generated and renewed using signing certificates for nbi/gui/kafka
In this example, custom signing certificates are in use for the northbound, GUI, and Kafka interfaces.Certificates will be exported to /root/userdata/certificates Default install generated signing certificates are in use for generating node certificates Custom signing certificates are in use for nbi/gui/kafka Server Certificates are generated and renewed using signing certificates for nbi/gui/kafka
-
View the exported certificate files.
[root@fss-deployer ~]# ls -ltr /root/userdata/certificates/ total 28 -r-------- 1 root root 1675 Dec 13 03:59 current-nodesigning__rootCA.key -r-------- 1 root root 1269 Dec 13 03:59 current-nodesigning__rootCA.pem -r-------- 1 root root 1679 Dec 13 03:59 current-nbi__tls.key -r-------- 1 root root 1501 Dec 13 03:59 current-nbi__tls.crt -r-------- 1 root root 1874 Dec 13 03:59 current-nbi__ca.crt -r-------- 1 root root 3272 Dec 13 03:59 current-nbisigning__tls.key -r-------- 1 root root 1874 Dec 13 03:59 current-nbisigning__tls.crt
The following files were exported to the /root/userdata/certificates directory.- current-nodesigning__rootCA.key: the signing certificate used by Cert-Manager to sign and generate certificates for managed nodes.
- current-nodesigning__rootCA.pem: the self-signed root certificate for managed nodes.
- current-nbi tls.crt: the signing certificate used by Cert-Manager to sign and generate other certificates.
- current-nbi__tls.key: the private key.
- current-nbi__ca.crt: the signed certificate.
- current-nbisigning__tls.key: the private key for the signing certificate.
- current-nbisigning__tls.crt: the signed certificate.
Note: current-nbi__ca.crt and current-nbisigning__tls.crt are the same files. -
Copy the contents of the current-nodesigning__rootCA.key
and current-nodesigning__rootCA.pem files to a directory in
the intended standby system.
The content of the current-nodesigning__rootCA.key file resembles the example below. Copy the entire contents shown to a file in the intended standby.
-----BEGIN RSA PRIVATE KEY----- MIIJKAIBAAKCAgEAuMg5L2oizpf+g77atvmtuvc6Y4xBok27DbUDlYMBgkmy8Lj2 uolLD+WGlEODCrPcn+88IMG+xiHyuomu0vqMVF2UxJZD8K0AHrhRv6uDPXPr+D1e SHj3MfntkQEcCHH0Bakk7sc0FhqgvgWNJWRXz+g/QI24BAhJx/lvEDtwrwnLg4Sg ydTjd2D+a+XtcxoMvyWGxQdkqse/qVY1zibzBtmQKJ+3dXjOc6UHVVyrxP5fgWn2 ebw1hxG6rQdJ7HkFpwH3p/rYUHjrGXSxhgm7YEPNLXuuhxzW+maFxZ3VpyHwl/lE vrGzMhTsBXogm+Jj0fZdbiGF4khJwNp6OaUhqHM37rabWCzMxki8uNR1pXkFdgHf b9Ph5e0bfTix8L+keUmCSyfQdp404eKEsMmc3JFruH6oJU/9bdNESyHTZ2eK+F4g +roe2Fu9TB1p64QUUtQv8k2s77qFiuqvaRL1hDNV4sNuIeNmKcu1n8dU+vRiGL2T z95xqGYjYNx6SeNC/WCLBodyVAjPAayFRTB5y5K28x81Ip0Ozjz7+XdFFSV8amOa
-
From the deployer on the intended standby system, deploy the required
certificate files.
In the standby system, assume that you copied the key to the standby-nodesigning__rootCA.key file and the root CA to the standby-nodesigning__rootCA.pem file.
Enter the following command to deploy the certificates:
[root@fss-standby-deployer ~]# /root/bin/fss-certificate.sh deploy-node-ca-certs --certificate /root/directory/standby-nodesigning__rootCA.key --pem /root/directory/standby-nodesigning__rootCA.pem
Note: If the deployer is already configured with the node signing certificates, use the --force option. -
Verify that the deployed certificates in the intended standby system are
correct.
[root@fss-standby-deployer ~]# /root/bin/fss-certificate.sh export -d directory Certificates will be exported to /root/userdata/directory Custom signing certificates are in use for generating node certificates Custom server certificates are in use for nbi/gui/kafka
-
Repeat steps 3 and 4 as needed for other
certificates.
In step 1, if the output shows that custom signing certificates are in use for northbound, UI, and Kafka interfaces, copy the needed files to the intended standby and deploy them.
Geo-redundancy parameters
Parameter | Description | Value |
---|---|---|
Local parameters: configures the active system | ||
Name | Specifies the name of the local site. The local site is assigned the active role. | String |
URL | Specifies the URL of the local system. | — |
User and Password | Specifies the credentials that the remote system uses to log in to this local system. You can only configure geo-redundancy using the geored user account and you must provide the default geored password. You can change the password for the geored user as needed. | String |
Active | Specifies whether the local cluster is active. | Enable on the active cluster |
Verify Remote CA | Checks whether the certificates on the standby cluster are valid. If enabled, enter the Root CA for the standby site. | — |
Remote parameters: configures the standby system | ||
Name | Specifies the name of the remote site; the remote is the standby site. | String |
URL | Specifies the URL of the remote system. | IP notation |
User and Password | Specifies what credentials to use to log in to the remote system. You can only configure geo-redundancy using the geored user account. You can change the password for the geored user as needed. | String |
Active | Specifies that the remote cluster is the standby; must be disabled for the standby. | — |
Verify Remote CA | Checks whether the certificates on the active cluster are valid. If enabled, enter the Root CA for the active cluster. | |
Sync queue length | ||
Sync Queue Length | Specifies the number of messages that can be buffered in the
queue. Note: Available only over the
API. |
Integer Default: 25000 |
Configuring geo-redundancy
- Perform this procedure during a maintenance window.
- The intended active and standby systems must be running the same Fabric Services System software version.
- The intended active and standby systems must be reachable.
- The standby system should not be running Digital Sandbox workloads.
- Update the password for the geored user. For instructions, see Resetting internal passwords.
- Be prepared to provide the following information for the active and standby
Fabric Services System instances:
- names for the active and standby systems
- the URLs for the active and standby systems
- the password for the geored user
-
Realign the certificates between the intended active and standby systems.
For instructions, see Realigning certificates.
-
From the main menu of the intended active system, select
Geo-Redundancy.
The systems should automatically come up in the Syncing state. If the systems do not come up in the Syncing state, from the Geo-Redundancy page of the active cluster, click SyncStart to initiate the sync connection.The Geo-Redundancy page displays:
- the names of the local and remote sites
- the role of each site, either active or standby
WARNING:Before proceeding to the next step, ensure that there are no pending workload jobs, deployments, or any operations that could potentially modify the database in the background.
-
Reconcile data from the active to the standby.
From the active system Geo-Redundancy page, click select Reconcile.The Sync Status shows Reconcile, then moves to Syncing.
This action replaces the data collection set in the standby system with the data collection set from the active system.
- the names and roles (active or standby) of the clusters in the geo-redundant
system
The status of the active cluster is Active syncing; the status of the standby cluster is Standby syncing.
- the Fabric Services System services and the status of each service