How do I restore the Kubernetes etcd data in an NSP cluster?
Purpose
CAUTION System Data Corruption |
Attempting to restore the etcd data from one NSP cluster to a different NSP cluster causes the restore to fail, and renders the NSP cluster unrecoverable.
You must restore only an etcd data backup from the same NSP cluster; you cannot move an NSP cluster configuration to a different cluster, or restore a cluster configuration in a new cluster.
An etcd data backup, called a snapshot, captures all Kubernetes objects and associated critical information. A scheduled etcd data snapshot is performed daily. The following procedure describes how to recover a failed NSP cluster by restoring the etcd data from a snapshot.
Steps
Obtain and distribute snapshot | |
1 |
Log in as the root or NSP admin user on the NSP cluster host. |
2 |
Enter the following to identify the namespace of the nsp-backup-storage pod: # kubectl get pods -A | grep nsp-backup ↵ The leftmost entry in the output line is the namespace, which in the following example is nsp-psa-restricted: nsp-psa-restricted nsp-backup-storage-0 1/1 Running 0 5h16m |
3 |
Record the namespace value. |
4 |
Enter the following to identify the etcd snapshot to restore: # kubectl exec -n namespace nsp-backup-storage-0 - ls -la /tmp/backups/nsp-etcd/ ↵ where namespace is the namespace value recorded in Step 3 The directory contents are listed; the filename format of an etcd snapshot is: nsp-etcd_backup_timestamp.tar.gz where timestamp is the snapshot creation time |
5 |
Record the name of the snapshot file that you need to restore. |
6 |
Enter the following to copy the snapshot file from the backup pod to an empty directory on the local file system: # kubectl cp namespace/nsp-backup-storage-0:/tmp/backups/nsp-etcd/snapshot_file path/snapshot_file ↵ where namespace is the namespace value recorded in Step 3 path is an empty local directory snapshot_file is the snapshot file name recorded in Step 5 |
7 |
Enter the following: Note: The file lists either one member, or three, depending on the deployment type. # grep ETCD_INITIAL /etc/etcd.env ↵ Output like the following is displayed. ETCD_INITIAL_ADVERTISE_PEER_URLS=https://local_address:port ETCD_INITIAL_CLUSTER_STATE=existing ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd ETCD_INITIAL_CLUSTER=etcd1=https://address_1:port,etcd2=https://address_2:port,etcd3=https://address_3:port where local_address is the IP address of the etcd cluster member you are operating from address_1. address_2. and address_3 are the addresses of all etcd cluster members port is a port number |
8 |
Perform the following on each etcd cluster member. Note: After this step, the etcd cluster is unreachable until the restore is complete.
|
Restore database on etcd cluster members | |
9 |
Perform Step 11 to Step 19 on each etcd cluster member. |
10 |
Go to Step 20. |
11 |
Log in as the root or NSP admin user. |
12 |
Navigate to the directory that contains the transferred snapshot file. |
13 |
Enter the following: # tar xzf path/nsp-etcd_backup_timestamp.tar.gz ↵ where path is the absolute path of the snapshot file timestamp is the snapshot creation time The snapshot file is uncompressed. |
14 |
Enter the following: # ETCDCTL_API=3 etcdctl snapshot restore etcd.db --name member --initial-cluster initial_cluster --initial-cluster-token token --initial-advertise-peer-urls URL ↵ where member is the name of the cluster member you are working on, for example, etcd2 initial_cluster is the ETCD_INITIAL_CLUSTER list of cluster members recorded in Step 7 token is the ETCD_INITIAL_CLUSTER_TOKEN value recorded in Step 7 URL is the URL of the cluster member you are working on; for example, the etcd2 cluster member URL shown in Step 7 is https://address_2:port The etcd database is restored. |
15 |
Enter the following to create a directory in which to store the previous database: # mkdir path/old_etcd_db ↵ where path is the absolute path of the directory to create |
16 |
Enter the following to move the previous database files to the created directory: # mv /var/lib/etcd/* path/old_etcd_db ↵ where path is the absolute path of the directory created in Step 15 |
17 |
Enter the following: # mv ./member.etcd/* /var/lib/etcd/ ↵ where member is the member name specified in Step 14 The backup files move to the /var/lib/etcd directory. |
18 |
Enter the following: # systemctl start etcd ↵ The etcd service starts. |
19 |
Enter the following: # systemctl status etcd ↵ The etcd service status is displayed. The service is up if the following is displayed: Active: active (running) |
20 |
When the etcd service is up, close the open console windows. End of steps |