Backup and restore

The Fabric Services System uses Kubernetes' ability to take a point-in-time (PiT) snapshot to implement backup and restore capabilities. A snapshot represents a point-in-time copy of a volume. A snapshot can be used to provision a new volume or to restore the existing volume to the previous state captured in the snapshot.

To take these point-in-time snapshots, the container storage interface (CSI) standard is used. This standard allows the creation and deletion of volume snapshots via the Kubernetes API and the creation of new volumes pre-populated with the data from a snapshot via Kubernetes dynamic volume provisioning.

The Fabric Services System uses the BorgBackup program to move the point-in-time snapshot to a remote location. For more information about the BorgBackup program, see https://borgbackup.readthedocs.io/en/stable/index.html#. Through BorgBackup, the backup solution supports:

  • Data de-duplication to lower the amount of storage used
  • Data encryption to secure the backup
  • (Optional) Remote storage of the backup over SSH on a remote server (using SSHFS)

Backup consistency in a micro-service architecture

A backup must be consistent across all the micro-services of an application to guarantee the restore capabilities. To achieve consistency, before the snapshots can be taken, ensure that all the micro-services have finished their write activity and that no further changes are accepted or incoming to the system. The best way to achieve this is by blocking access to the services temporarily.

The backup script has an option (-s) to specifically guarantee this consistency by scaling down all the micro-services for a short duration (the time to take the snapshots). Using this option prevents any changes to be made while the snapshot is taken. Nokia strongly recommends the use of this option; it is a requirement in production environments. When you omit this option, the restore functionality cannot be guaranteed.

Additionally, execute a backup only when:

  • No background tasks are running like configuration generation or deployment of a fabric or workload intent.
  • No Digital Sandbox Fabrics are present and no Digital Sandbox workloads are running.

The deployer VM must also be available during the backup procedure to ensure that the Kubernetes cluster has access to the registry with the image of all services, in case they are needed.

Supported restore scenarios

The restore operation from a backup always requires a fresh installation of the entire Fabric Services System deployment. The restore takes place as part of the installation of a fresh environment. The restore process has the following requirements:
  • The same version of the Fabric Services System must be used; for example, if a backup is taken from a deployment with version 22.4.1-90, the restore has to use that same version.
  • The Fabric Services System nodes on which the restore is executed must have the same IP addresses and FQDNs as the environment from which the backup was taken.
  • The input.json file that was used for the installation of the environment from which the backup was taken must be used for the restore procedure.

Backup and restore scripts

The backup and restore commands are executed from one of the Kubernetes nodes, preferably one of the master nodes. The node on which the command is executed must have passwordless (key-based) SSH access to all other Kubernetes nodes as the root user. If a remote server is used through the FSS_BACKUP_REPOS environment variable, passwordless (key-based) SSH is also required to the remote server.

You use the fss-backup.py script for the backup operation.
fss-backup.py [-h] -b backup_name [-s] [-i] [-l [location]] [-c]
You use the fss-restore.py script for the restore operation.
fss-restore.py [-h] -b backup_name [-l [location]] [-c]
Table 1. Parameter descriptions
Option Specifies
-h Display the help screen and describes the usage of the script and CLI arguments.
-b backup_name Name of the backup file, consisting of lower-case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
-s Enable automatic scale down of the Fabric Services System deployment before taking the snapshots. This option guarantees that all services are temporarily down and a consistent backup of the entire environment is taken. This option is strongly recommended.
-i Include the logs volume in the backup.
-l location Full path to the folder where the backup file is stored, for example /data/backups. This option overwrites and takes precedence over the FSS_BACKUP_REPOS environment variable. This folder is local on the Kubernetes node where the backup command is executed, but can be mounted from a remote location like an NFS storage. Use the following format:
  • if the backup file is stored in a local directory: /<full path>/<directory>, for example, /data/backups
  • if the backup file is stored in a remote location: user@<ip address hostname>:/<full path>/<directory>
-c Clean up a failed backup.
The scripts read the following environment variables:
  • BORG_PASSPHRASE - a required environment variable; this password string is required by the Borg Backup utility to encrypt or decrypt backup files.

    Keep this passphrase safe and secure, if you lose it, access to the backup data is lost as well.

  • FSS_BACKUP_REPOS - specifies the location where the backup files are stored. It can be a local directory or remote location. You can use the -l option to override the setting for the FSS_BACKUP_REPOS variable.
  • BACKUP_RESTORE_LOGS - specifies log location for the backup and restore operation. If it is not configured, logs are stored in logs directory where scripts are being executed.
In the following example, compute6 is used as the backup repository.
export BORG_PASSPHRASE='XYZ2022_nOkIaNeTwOrKs_FsS1!&&123'
export FSS_BACKUP_REPOS=root@compute6:backup-repo
export BACKUP_RESTORE_LOGS=/root/backup-restore/logs

Backing up

The resulting backup is a full backup.
  1. Run the backup script on the master node of the Kubernetes cluster that needs to be backed up.
    /root/backup-restore/fss-backup.py  -b <backup_name> [-s] [-i] [-l <location>]
    /root/backup-restore/fss-backup.py -b 0427-fss1 -s -l /root/backups
  2. Display the status of the backup operation.
    You can view the status of the backup in the status.json file in the logs directory.
    [root@master-node backup-restore]# cat logs/0427-fss1-logs/status.json
    {
      "fss_version": "v22.4.1-52",
      "status": "backup_completed",
      "completed_pvcs": [
        "data-dev-postgresql-0",
        "datadir-0-dev-cp-kafka-0",
        "datadir-0-dev-cp-kafka-1",
        "datadir-0-dev-cp-kafka-2",
        "datadir-dev-cp-zookeeper-0",
        "datadir-dev-cp-zookeeper-1",
        "datadir-dev-cp-zookeeper-2",
        "datadir-dev-neo4j-core-0",
        "datalogdir-dev-cp-zookeeper-0",
        "datalogdir-dev-cp-zookeeper-1",
        "datalogdir-dev-cp-zookeeper-2",
        "dev-fss-dhcpd",
        "dev-fss-dhcpd-lease",
        "dev-fss-image-mgmtstack",
        "dev-fss-node-checkpoints",
        "dev-fss-nodecfg",
        "dev-mongodb"
      ],
      "failed_pvcs": [],
      "failure_msg": "",
      "last_updated_at": "2022-04-28T04:09:30.453392+00:00"
    }
  3. Verify the backup files in the repository.
    [root@master-node backups]# borg list 0427-fss1 --short
    manifests
    data-dev-postgresql-0
    datadir-0-dev-cp-kafka-0
    datadir-0-dev-cp-kafka-1
    datadir-0-dev-cp-kafka-2
    datadir-dev-cp-zookeeper-0
    datadir-dev-cp-zookeeper-1
    datadir-dev-cp-zookeeper-2
    datadir-dev-neo4j-core-0
    datalogdir-dev-cp-zookeeper-0
    datalogdir-dev-cp-zookeeper-1
    datalogdir-dev-cp-zookeeper-2
    dev-fss-dhcpd
    dev-fss-dhcpd-lease
    dev-fss-image-mgmtstack
    dev-fss-node-checkpoints
    dev-fss-nodecfg
    dev-mongodb
    [root@master-node backups]#

Restoring a backup and installing the Fabric Services System application

The restore process consists of the following high-level tasks:
  1. Set up new Fabric Services System nodes with the same IPs and FQDNs.
  2. Install the Kubernetes cluster only (not the application) using the same input.json that was used for the original environment.
  3. Restore the backup data using one of the Kubernetes nodes (preferably a master node).
  4. Install the Fabric Services System application components using the same input.json that was used for the original environment.

Setting up new Fabric Services System nodes

The restore of the Fabric Services System backup has to be executed in a clean environment that uses the same IPs and FQDNs of the original environment. You can use the procedures from the Fabric Services System Software Installation Guide to set up such an environment.

  1. Optional: Complete the procedure "Deploying and configuring the Fabric Services System deployer VM".
    Note: Complete this procedure only if the deployment VM was also lost or needs to be reinstalled. Ensure that you are using the appropriate version, that is, the same version that was used to install original environment.
  2. Complete the procedure for you deployment scenario.
    • "Virtual machine-based Installation: Using the Fabric Services System base OS image"
    • "Bare metal-based installation: Preparing the Fabric Services System nodes"
    Important: When the nodes have been installed, do not execute "Installing Fabric Services System" in the Fabric Services System Software Installation Guide.

Installing the Kubernetes cluster

Execute this procedure on the deployer VM using the same input.json that you used for the installation of the original deployment.
  1. Initiate the setup.
    [root@fss-deployer ~]$ /root/bin/fss-install.sh configure sample-input.json
    Important: Do not execute the fss-install.sh script as suggested by the output of the preceding command.
  2. Start the installation of Kubernetes.
    [root@fss-deployer ~]$ /root/bin/setup-k8s.sh

    The installation time varies depending on the capacity of your system.

Restoring a backup

  • Ensure that the backup folder created by the backup script is available to the restore command on the Kubernetes node on which you are executing this procedure. You can configure this through the FSS_BACKUP_REPOS environment variable pointing to a remote server using SSH/SSHFS or by putting the backup folder on the Kubernetes node itself and using the -l option, similar to how the backup was taken.

  • Ensure that the BORG_PASSPHRASE environment variable is set to the correct passphrase, otherwise, the restore will fail.

  1. Restore the backup on the target Kubernetes node.
    /root/backup-restore/fss-restore.py -b backup_name [-l location]
    ./root/backup-restore/fss-restore.py -b 0427-fss1 -l /root/backups
    The script does following:
    1. Checks if the specified backup file name exists in the specified location in the Borg repository; if the file is not present, the script terminates
    2. Mounts the Borg repository locally to read the backup content
    3. Creates PVCs from the YAML files in the manifest archive in the Borg repository that contains the PVC definition for all the original PVCs; also creates PVs of the same size as the original PVs in the source cluster
    4. Creates a mount pod for each PVC on the node where the restore script is being executed, then mounts newly created empty PVCs
    5. Copies the volume content of the mounted backup to the corresponding PVC volume
    6. Deletes the PVC mount pod; the PVCs and PVs are still present and point to the new data volume where data is copied
  2. Check the status of the restore operation in the status.json file in the logs/<backup-name>-restore-logs directory.
    [root@compute1 backup-restore]# cat logs/0501-3-fssbackup-restore-logs/status.json
    {
      "fss_version": "v22.4.1-52",
      "status": "restore_completed",
      "restored_pvcs": [
        "datadir-mvsrlfsp05-cp-zookeeper-2",
        "datadir-mvsrlfsp05-cp-zookeeper-0",
        "mvsrlfsp05-fss-nodecfg",
        "datadir-mvsrlfsp05-mongodb-secondary-0",
        "datalogdir-mvsrlfsp05-cp-zookeeper-0",
        "mvsrlfsp05-fss-image-mgmtstack",
        "mvsrlfsp05-fss-dhcpd-lease",
        "datalogdir-mvsrlfsp05-cp-zookeeper-1",
        "datadir-mvsrlfsp05-neo4j-core-0",
        "mvsrlfsp05-fss-node-checkpoints",
        "datalogdir-mvsrlfsp05-cp-zookeeper-2",
        "mvsrlfsp05-fss-dhcpd",
        "datadir-mvsrlfsp05-cp-zookeeper-1",
        "datadir-mvsrlfsp05-mongodb-primary-0",
        "mvsrlfsp05-fss-logs-volume",
        "data-mvsrlfsp05-postgresql-0",
        "datadir-0-mvsrlfsp05-cp-kafka-1",
        "datadir-0-mvsrlfsp05-cp-kafka-2",
        "datadir-0-mvsrlfsp05-cp-kafka-0"
      ],
      "failure_msg": "",
      "last_updated_at": "2022-05-01T19:10:26.207653+00:00"
    }[root@mvsrlfsp05-compute1 backup-restore]#
    
  3. Verify the restored files.
    Execute the kubectl get pvc command. The output should match the output of the borg list command in step 3 of Backing up.
    When the output shows that the PVC have been installed, you can install the Fabric Services System application.

Installing the Fabric Services System application

Now that the backup data has been restored, you can install the Fabric Services System in the Kubernetes cluster. Use the deployer VM from which you executed the commands to install the Kubernetes cluster.

  1. Initiate the setup.
    [root@fss-deployer ~]$ /root/bin/fss-install.sh configure sample-input.json
    Important: Do not execute the fss-install.sh script as suggested by the output of the preceding command.
  2. Start the installation of the Fabric Services System application.
    The installation time varies depending on the capacity of your system.
    [root@fss-deployer ~] $ /root/bin/fss-app-install.sh
At the end of this procedure, the Fabric Services System application has been restored from your backup and is ready for use.