How do I restore the Kubernetes etcd data in an NSP cluster?

Purpose
CAUTION 

CAUTION

System Data Corruption

Attempting to restore the etcd data from one NSP cluster to a different NSP cluster causes the restore to fail, and renders the NSP cluster unrecoverable.

You must restore only an etcd data backup from the same NSP cluster; you cannot move an NSP cluster configuration to a different cluster, or restore a cluster configuration in a new cluster.

An etcd data backup, called a snapshot, captures all Kubernetes objects and associated critical information. A scheduled etcd data snapshot is performed daily. The following procedure describes how to recover a failed NSP cluster by restoring the etcd data from a snapshot.

Steps
Obtain and distribute snapshot
 

Log in as the root or NSP admin user on the NSP cluster host.


Enter the following to identify the namespace of the nsp-backup-storage pod:

kubectl get pods -A | grep nsp-backup ↵

The leftmost entry in the output line is the namespace, which in the following example is nsp-psa-restricted:

nsp-psa-restricted   nsp-backup-storage-0   1/1   Running   0   5h16m


Record the namespace value.


Enter the following to identify the etcd snapshot to restore:

kubectl exec -n namespace nsp-backup-storage-0 - ls -la /tmp/backups/nsp-etcd/ ↵

where namespace is the namespace value recorded in Step 3

The directory contents are listed; the filename format of an etcd snapshot is:

nsp-etcd_backup_timestamp.tar.gz

where timestamp is the snapshot creation time


Record the name of the snapshot file that you need to restore.


Enter the following to copy the snapshot file from the backup pod to an empty directory on the local file system:

kubectl cp namespace/nsp-backup-storage-0:/tmp/backups/nsp-etcd/snapshot_file path/snapshot_file

where

namespace is the namespace value recorded in Step 3

path is an empty local directory

snapshot_file is the snapshot file name recorded in Step 5


Enter the following:

Note: The file lists either one member, or three, depending on the deployment type.

grep ETCD_INITIAL /etc/etcd.env ↵

Output like the following is displayed.

ETCD_INITIAL_ADVERTISE_PEER_URLS=https://local_address:port

ETCD_INITIAL_CLUSTER_STATE=existing

ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd

ETCD_INITIAL_CLUSTER=etcd1=https://address_1:port,etcd2=https://address_2:port,etcd3=https://address_3:port

where

local_address is the IP address of the etcd cluster member you are operating from

address_1. address_2. and address_3 are the addresses of all etcd cluster members

port is a port number


Perform the following on each etcd cluster member.

Note: After this step, the etcd cluster is unreachable until the restore is complete.

  1. Log in as the root or NSP admin user.

  2. Enter the following:

    systemctl stop etcd ↵

    The etcd service stops.

  3. Transfer the snapshot file obtained in Step 7 to the cluster member.


Restore database on etcd cluster members
 

Perform Step 11 to Step 19 on each etcd cluster member.


10 

Go to Step 20.


11 

Log in as the root or NSP admin user.


12 

Navigate to the directory that contains the transferred snapshot file.


13 

Enter the following:

tar xzf path/nsp-etcd_backup_timestamp.tar.gz ↵

where

path is the absolute path of the snapshot file

timestamp is the snapshot creation time

The snapshot file is uncompressed.


14 

Enter the following:

ETCDCTL_API=3 etcdctl snapshot restore etcd.db --name member --initial-cluster initial_cluster --initial-cluster-token token --initial-advertise-peer-urls URL

where

member is the name of the cluster member you are working on, for example, etcd2

initial_cluster is the ETCD_INITIAL_CLUSTER list of cluster members recorded in Step 7

token is the ETCD_INITIAL_CLUSTER_TOKEN value recorded in Step 7

URL is the URL of the cluster member you are working on; for example, the etcd2 cluster member URL shown in Step 7 is https://address_2:port

The etcd database is restored.


15 

Enter the following to create a directory in which to store the previous database:

mkdir path/old_etcd_db ↵

where path is the absolute path of the directory to create


16 

Enter the following to move the previous database files to the created directory:

mv /var/lib/etcd/* path/old_etcd_db ↵

where path is the absolute path of the directory created in Step 15


17 

Enter the following:

mv ./member.etcd/* /var/lib/etcd/ ↵

where member is the member name specified in Step 14

The backup files move to the /var/lib/etcd directory.


18 

Enter the following:

systemctl start etcd ↵

The etcd service starts.


19 

Enter the following:

systemctl status etcd ↵

The etcd service status is displayed.

The service is up if the following is displayed:

Active: active (running)


20 

When the etcd service is up, close the open console windows.

End of steps