Backup and restore

The Fabric Services System uses Kubernetes' ability to take a point-in-time (PiT) snapshot to implement backup and restore capabilities. A snapshot represents a point-in-time copy of a volume. A snapshot can be used to provision a new volume or to restore the existing volume to the previous state captured in the snapshot.

To take these point-in-time snapshots, the container storage interface (CSI) standard is used. This standard allows the creation and deletion of volume snapshots via the Kubernetes API and the creation of new volumes pre-populated with the data from a snapshot via Kubernetes dynamic volume provisioning.

The Fabric Services System uses the BorgBackup program to move the point-in-time snapshot to a remote location. For more information about the BorgBackup program, see BorgBackup documentation. Through BorgBackup, the backup solution supports:

  • data de-duplication to lower the amount of storage used
  • data encryption to secure the backup
  • (Optional) remote storage of the backup over SSH on a remote server (using SSHFS)

Best practices and considerations

Nokia recommends that you follow the best practices listed below when backing up the Fabric Services System:
  • Store the backup data on an external system for extra safety. If you store the backup on the deployer VM and the deployer VM is lost, you have no access to your backups to restore.
  • Schedule the backup to be executed on a regular basis, for instance every night, so that you have regular backups, in case there is a need to restore.
Be aware of the following considerations when deploying the Fabric Services System:
  • The backup process brings down the application for 5-10 minutes to create a consistent backup, as described in Backup consistency in a micro-service architecture. Take this down time into account when planning the regular backup window.
  • While the application is down, no changes can be made to the environment. New alarms are raised when the application has started again. Alarms that happen during the downtime and are resolved within the same downtime are not raised, as they were already resolved.

Backup consistency in a micro-service architecture

A backup must be consistent across all the micro-services of an application to guarantee the restore capabilities. To achieve consistency, before the snapshots can be taken, ensure that all the micro-services have finished their write activity and that no further changes are accepted or incoming to the system. The best way to achieve this is by blocking access to the services temporarily.

The backup script has an option (-s) to specifically guarantee this consistency by scaling down all the micro-services for a short duration (the time to take the snapshots). Using this option prevents any changes to be made while the snapshot is taken. Nokia strongly recommends the use of this option; it is a requirement in production environments. If you omit this option, the restore functionality cannot be guaranteed.

Additionally, execute a backup only when no background tasks are running, like configuration generation or deployment of a fabric or workload VPN intent.

The deployer VM is used for the backup procedure and ensures that the Kubernetes cluster has access to the registry with the image of all services, in case they are needed.

Supported restore scenarios

The restore operation from a backup always requires a fresh installation of the entire Fabric Services System deployment. The restore takes place as part of the installation of a fresh environment. The restore process has the following requirements:
  • The same version of the Fabric Services System must be used; for example, if a backup is taken from a deployment with version 22.8.1, the restore has to use that same version.
  • The Fabric Services System nodes on which the restore is executed must have the same IP addresses and FQDNs as the environment from which the backup was taken.
  • The input.json file that was used by the deployer VM for the installation of the environment from which the backup was taken must be used for the restore procedure.

Backup and restore scripts

You execute backup and restore commands from the deployer VM. The deployer VM must have passwordless (key-based) SSH access to all other Kubernetes nodes as the root user. If a remote server is used through the FSS_BACKUP_REPOS environment variable, passwordless (key-based) SSH is also required to the remote server.

You use the fss-backup.py script for the backup operation.
fss-backup.py [-h] -b backup-name [-s] [-i] [-l [location]] [-c]
You use the fss-restore.py script for the restore operation.
fss-restore.py [-h] -b backup-name [-l [location]] [-c]

The backup and restore scripts are located in the /root/bin/backup-restore directory of the deployer VM.

Table 1. Parameter descriptions
Option Specifies
-h Display the help screen and describes the usage of the script and CLI arguments.
-b backup-name Name of the backup file, consisting of lower-case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
-s Enable automatic scale down of the Fabric Services System deployment before taking the snapshots. This option guarantees that all services are temporarily down and a consistent backup of the entire environment is taken. This option is strongly recommended.
Note: If you run the fss-backup.py command with the -s option, expect deviations in the fabric intents for about 1 to 2 minutes after the pods are scaled up.
-i Include the logs volume in the backup.
-l location Full path to the folder where the backup file is stored, for example /data/backups. This option overwrites and takes precedence over the FSS_BACKUP_REPOS environment variable. This folder is local on the Kubernetes node where the backup command is executed, but can be mounted from a remote location like an NFS storage. Use the following format:
  • if the backup file is stored in a local directory: /<full path>/<directory>, for example, /data/backups
  • if the backup file is stored in a remote location: user@<ip address hostname>:/<full path>/<directory>
-c Clean up a failed backup.
The scripts read the following environment variables:
  • BORG_PASSPHRASE - a required environment variable; this password string is required by the Borg Backup utility to encrypt or decrypt backup files.

    Keep this passphrase safe and secure, if you lose it, access to the backup data is lost as well.

  • FSS_BACKUP_REPOS - specifies the location where the backup files are stored. It can be a local directory or remote location. You can use the -l option to override the setting for the FSS_BACKUP_REPOS variable.
  • BACKUP_RESTORE_LOGS - specifies the log location for the backup and restore operation. If it is not configured, logs are stored in logs directory where scripts are being executed.
In the following example, compute6 is used as the backup repository.
export BORG_PASSPHRASE='XYZ2022_nOkIaNeTwOrKs_FsS1!&&123'
export FSS_BACKUP_REPOS=root@compute6:backup-repo
export BACKUP_RESTORE_LOGS=/root/bin/backup-restore/logs
You can set environment variables in the ~/.env directory in the deployer VM:
root@fss-deployer backup-restore]# cat  ~/.env
KUBECONFIG=/var/lib/fss/config.fss
BORG_PASSPHRASE='XYZ2022_B0RgPasSheRE_FsS1!&&123'
FSS_BACKUP_REPOS=root@compute6:backup-repo
BACKUP_RESTORE_LOGS=/root/bin/backup-restore/logs
root@fss-deployer backup-restore]#

Backing up

The resulting backup is a full backup.
  1. Run the backup script on the deployer VM of the Fabric Services System deployment that you are backing up.
    /root/bin/backup-restore/fss-backup.py  -b <backup_name> [-s] [-i] [-l <location>]
    /root/bin/backup-restore/fss-backup.py -b 0810-fss -s
    Note: If you run the fss-backup.py command with the -s option, expect deviations in the fabric intents for about 1 to 2 minutes after the pods are scaled up.
  2. Display the status of the backup operation.
    You can view the status of the backup in the status.json file in the logs directory.
    [root@fss-deployer backup-restore]# cat logs/0810-fss-logs/status.json{
      "fss_version": "v22.8.0-5",
      "status": "backup_completed",
      "completed_pvcs": [
        "data-prod-postgresql-0",
        "datadir-prod-mongodb-primary-0",
        "datadir-prod-mongodb-secondary-0",
        "datadir-prod-neo4j-core-0",
        "prod-fss-dhcpd",
        "prod-fss-dhcpd-lease",
        "prod-fss-image-mgmtstack",
        "prod-fss-node-checkpoints",
        "prod-fss-nodecfg"
      ],
      "failed_pvcs": [],
      "failure_msg": "",
      "last_updated_at": "2022-08-10T17:51:23.521569+00:00"
    }
  3. Verify the backup files in the repository.
    [root@compute1 backup-repo]# borg list 0810-fss --short
    manifests
    data-prod-postgresql-0
    datadir-prod-mongodb-primary-0
    datadir-prod-mongodb-secondary-0
    datadir-prod-neo4j-core-0
    prod-fss-dhcpd
    prod-fss-dhcpd-lease
    prod-fss-image-mgmtstack
    prod-fss-node-checkpoints
    prod-fss-nodecfg
    

Restore a backup and install the Fabric Services System application

The restore process consists of the following high-level tasks:
  1. Setting up new Fabric Services System nodes with the same IP addresses and FQDNs
  2. Installing the Kubernetes cluster only (not the application) using the same input.json that was used for the original environment
  3. Restoring the backup data from the deployer node
  4. Installing the Fabric Services System application components using the same input.json that was used for the original environment
  5. For Digital Sandbox only, restoring Digital Sandbox fabrics

Setting up new Fabric Services System nodes

The restore of the Fabric Services System backup has to be executed in a clean environment that uses the same IP addresses and FQDNs of the original environment. You can use the procedures from the Fabric Services System Software Installation Guide to set up such an environment.

  1. Optional: Complete the procedure "Deploying and configuring the Fabric Services System deployer VM".
    Note: Complete this procedure only if the deployment VM is also lost or needs to be reinstalled. Ensure that you are using the appropriate version, that is, the same version that was used to install original environment.
  2. Create and configure the Fabric Services System virtual machine nodes.
    You must use the Fabric Services System base OS image. For instructions, see Preparing the Fabric Services System virtual machine nodes.
    Note: When the nodes have been installed, do not execute "Installing Fabric Services System" in the Fabric Services System Software Installation Guide.

Installing the Kubernetes cluster

Execute this procedure on the deployer VM using the same input.json that you used for the installation of the original deployment.
  1. Initiate the setup.
    [root@fss-deployer ~]$ /root/bin/fss-install.sh configure sample-input.json
    Note: Do not execute the fss-install.sh script as suggested by the output of the preceding command.
  2. Start the installation of Kubernetes.
    [root@fss-deployer ~]$ /root/bin/setup-k8s.sh

    The installation time varies depending on the capacity of your system.

Restoring a backup

Use this procedure to restore a previously saved backup.
  • Ensure that the backup folder created by the backup script is available to the restore command on the deployer VM on which you are executing this procedure. You can configure this through the FSS_BACKUP_REPOS environment variable pointing to a remote server using SSH/SSHFS or by putting the backup folder on the Kubernetes node itself and using the -l option, similar to how the backup was taken.

  • Ensure that the BORG_PASSPHRASE environment variable is set to the correct passphrase, otherwise, the restore fails.

  1. Restore the backup from the deployer VM.
    /root/bin/backup-restore/fss-restore.py -b backup_name [-l location]
    ./root/bin/backup-restore/fss-restore.py -b 0810-fss
    The script does following:
    1. Checks if the specified backup filename exists in the specified location in the Borg repository; if the file is not present, the script terminates
    2. Mounts the Borg repository locally to read the backup content
    3. Creates PVCs from the YAML files in the manifest archive in the Borg repository that contains the PVC definition for all the original PVCs; also creates PVs of the same size as the original PVs in the source cluster
    4. Creates a mount pod for each PVC on the node where the restore script is being executed, then mounts newly created empty PVCs
    5. Copies the volume content of the mounted backup to the corresponding PVC volume
    6. Deletes the PVC mount pod; the PVCs and PVs are still present and point to the new data volume where data is copied
  2. Check the status of the restore operation in the status.json file in the logs/<backup-name>-restore-logs directory.
    [root@fss-deployer backup-restore]# cat logs/0501-3-fssbackup-restore-logs/status.json
    {
    "fss_version": "v22.8.0-5",
      "status": "restore_completed",
      "completed_pvcs": [
        "data-prod-postgresql-0",
        "datadir-prod-mongodb-primary-0",
        "datadir-prod-mongodb-secondary-0",
        "datadir-prod-neo4j-core-0",
        "prod-fss-dhcpd",
        "prod-fss-dhcpd-lease",
        "prod-fss-image-mgmtstack",
        "prod-fss-node-checkpoints",
        "prod-fss-nodecfg"
      ],
      "failed_pvcs": [],
      "failure_msg": "",
      "last_updated_at": "2022-08-10T17:51:23.521569+00:00"
    }
  3. Verify the restored files.
    Execute the kubectl get pvc command. The output should match the output of the borg list command in step 3 of Backing up.
    When the output shows that the PVC have been installed, you can install the Fabric Services System application.

Installing the Fabric Services System application

Now that the backup data has been restored, you can install the Fabric Services System in the Kubernetes cluster. Use the deployer VM from which you executed the commands to install the Kubernetes cluster.

  1. Initiate the setup.
    [root@fss-deployer ~]$ /root/bin/fss-install.sh configure sample-input.json
    Important: Do not execute the fss-install.sh script as suggested by the output of the preceding command.
  2. Start the installation of the Fabric Services System application.
    The installation time varies depending on the capacity of your system.
    [root@fss-deployer ~] $ /root/bin/fss-app-install.sh
At the end of this procedure, the Fabric Services System application has been restored from your backup and is ready for use.

Restoring Digital Sandbox fabrics

Digital Sandbox fabrics are not automatically restarted after a deployment has been restored from a backup. To recover all the Digital Sandbox Fabrics, follow these steps:
  1. Complete the procedure Updating the Digital Sandbox.
    This action triggers a full redeployment of all Digital Sandbox components for the region.
  2. From the Fabric intents overview page, wait until all nodes show up as blue.
    This state indicates all nodes are associated and are up.
  3. Complete the procedure Deploying a fabric intent in the Digital Sandbox.
  4. Repeat Steps 2 and 3 for each Digital Sandbox fabric.
    At the end of procedure, all Digital Sandbox fabrics should be running in the same state as before the backup.
  5. Restore previously added software catalog images.
    If you previously added SR Linux images to the software catalog, you need to modify the software catalog and then upload the SR Linux images. For instructions, see Adding a new network operating system version to the software catalog and Uploading SR Linux container images for Digital Sandbox.