How do I replace an NSP cluster node?

Purpose

The following steps describe how to replace a node in an NSP cluster, as may be required in the event of a node failure.

Note: If root access for remote operations is disabled in the NSP configuration, remote operations such as SSH and SCP as the root user are not permitted within an NSP cluster. Steps that describe such an operation as the root user must be performed as the designated non-root user with sudoer privileges.

For simplicity, such steps describe only root-user access.

Note: release-ID in a file path has the following format:

R.r.p-rel.version

where

R.r.p is the NSP release, in the form MAJOR.minor.patch

version is a numeric value

Node replacement in a standalone deployment

In order to perform the procedure in a standalone NSP deployment, the following must be true.

Note: If you need to replace a node in a standalone NSP cluster and do not have a recent NSP system backup, you cannot use the procedure to replace the node. Instead, you must recreate the cluster configuration; contact technical support for assistance.

CAUTION 

CAUTION

Service outage

Performing the procedure in a standalone NSP deployment causes a service outage.

Ensure that you perform the procedure only during a scheduled maintenance period with the supervision of Nokia technical support.

Steps
Acquire node information
 

Log in as the root or NSP admin user on the NSP cluster host in the NSP cluster that requires the node replacement.


Open a console window.


Enter the following to show the node roles:

kubectl get nodes --show-kind ↵

Output like the following is displayed; the example below is for a three-node cluster:

NAME         STATUS   ROLES                  AGE   VERSION

node/node1   Ready    control-plane,master   18d   v1.20.7

node/node2   Ready    control-plane,master   18d   v1.20.7

node/node3   Ready    <none>                 18d   v1.20.7

If the Roles value for the node you are replacing includes control-plane or master, the node has a master role; if the value is <none>, the node is a worker node.


Enter the following to show the node labels:

kubectl get nodes --show-labels ↵

NAME    STATUS   ROLES           AGE   VERSION   LABELS

node1   Ready    control-plane   10d   v1.32.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,etcd=true,isIngress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=test-node1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=

node2   Ready    control-plane   10d   v1.32.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,etcd=true,isIngress=true,kafka-0=true,kafka=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=test-node2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=

node3   Ready    <none>          10d   v1.32.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,etcd=true,file-service=true,isIngress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=node3,kubernetes.io/os=linux,linbit.com/hostname=node3,storage=true

node4   Ready    <none>          10d   v1.32.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,isIngress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=node4,kubernetes.io/os=linux,linbit.com/hostname=node4,storage=true

node5   Ready    <none>          10d   v1.32.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,isIngress=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=node5,kubernetes.io/os=linux,linbit.com/hostname=node5,storage=true

The storage=true label applies only to a multi-node setup as follows:

  • If an NSP cluster has more than three nodes, by default, the last three nodes are used to set the storage label.

  • You can set the storage label for a dedicated node.

    If the NSP cluster has more than three nodes and you want to set the label on nodes other than last three nodes, set the label on a dedicated node as follows before performing Step 13:

    kubectl label node node storage=true

    where node is name of the dedicated node

  • For two-node and three- node NSP setups, Step 13 sets storage=true on all nodes.


Ensure correct DR cluster roles
 

If the NSP system is not a DR deployment, skip this step.

In order to replace an NSP cluster node, the node must be in the standby cluster. After the failure of an DR NSP cluster node that hosts an essential NSP service, an NSP switchover automatically occurs. However, no automatic switchover occurs for a node that does not host an essential service.

If the node to replace is currently in the primary NSP cluster, perform How do I perform an NSP DR switchover from the NSP UI?.


Stop NSP cluster
 

Stop the NSP cluster.

Note: If the NSP cluster VMs do not have the required SSH key, you must include the --ask-pass argument in the nspdeployerctl command, as shown in the following example, and are subsequently prompted for the root password of each cluster member:

nspdeployerctl --ask-pass uninstall –-undeploy

Note: In a standalone deployment, performing this step marks the beginning of the service outage.

  1. Open a terminal session to the NSP deployer VM.

  2. Log in as the root or NSP admin user.

  3. Open the following file using a plain-text editor such as vi:

    /opt/nsp/NSP-CN-DEP-release-ID/NSP-CN-release-ID/config/nsp-config.yml

  4. Edit the following line in the platform section, kubernetes subsection to read as shown below:

      deleteOnUndeploy:false

  5. Save and close the file.

  6. Enter the following:

    cd /opt/nsp/NSP-CN-DEP-release-ID/bin ↵

  7. Enter the following:

    ./nspdeployerctl uninstall –-undeploy ↵

    The NSP cluster stops.


Reconfigure and start cluster
 

If the replacement node has the same IP address as the node you are replacing, you can skip this step.

Update the node IP address in the NSP cluster configuration.

  1. Open the following file on the NSP deployer VM using a plain-text editor such as vi:

    /opt/nsp/nsp-k8s-deployer-release-ID/config/k8s-deployer.yml

  2. Change the former node IP address to the new IP address.

  3. Save and close the file.

  4. Enter the following:

    cd /opt/nsp/nsp-k8s-deployer-release-ID/bin ↵

  5. Enter the following:

    ./nspk8sctl config -c ↵

    A new /opt/nsp/nsp-k8s-deployer-release-ID/config/hosts.yml file is created.


Enter the following:

ssh-copy-id -i ~/.ssh/id_rsa.pub root@address

where address is the replacement node IP address

The required SSH key is transferred to the replacement node.


Perform the following steps to back up the Kubernetes secrets.

  1. Enter the following on the NSP deployer VM:

    cd /opt/nsp/NSP-CN-DEP-release-ID/bin ↵

  2. Enter the following:

    ./nspdeployerctl secret -o backup_file backup ↵

    where backup_file is the absolute path and name of the backup file to create

    As the secrets are backed up, messages like the following are displayed for each Kubernetes namespace:

    Backing up secrets to /opt/backupfile...

      Including secret namespace:ca-key-pair-external

      Including secret namespace:ca-key-pair-internal

      Including secret namespace:nsp-tls-store-pass

    When the backup is complete, the following prompt is displayed:

    Please provide an encryption password for backup_file

    enter aes-256-ctr encryption password:

  3. Enter a password.

    The following prompt is displayed:

    Verifying - enter aes-256-ctr encryption password:

  4. Re-enter the password.

    The backup file is encrypted using the password.

  5. Record the password for use when restoring the backup.

  6. Record the name of the data center associated with the backup.

  7. Transfer the backup file to a secure location in a separate facility for safekeeping.


10 

Enter the following:

Note: If the NSP cluster VMs do not have the required SSH key, you must include the --ask-pass argument in the command, as shown in the following example, and are subsequently prompted for the root password of each cluster member:

nspk8sctl --ask-pass uninstall

./nspk8sctl uninstall ↵

The Kubernetes software in the cluster is uninstalled.


11 

Perform the following steps to restore the NSP Kubernetes secrets.

  1. Enter the following on the NSP deployer VM:

    cd /opt/nsp/NSP-CN-DEP-release-ID/bin ↵

  2. Enter the following:

    ./nspdeployerctl secret -i backup_file restore ↵

    where backup_file is the absolute path and filename of the secrets backup file created in Step 9

    The following prompt is displayed:

    Please provide the encryption password for /opt/backupfile

    enter aes-256-ctr decryption password:

  3. Enter the password recorded during the backup creation.

    As the secrets are restored, messages like the following are displayed for each Kubernetes namespace:

    Restoring secrets from backup_file...

    secret/ca-key-pair-external created

      Restored secret namespace:ca-key-pair-external

    secret/ca-key-pair-internal created

      Restored secret namespace:ca-key-pair-internal

    secret/nsp-tls-store-pass created

      Restored secret namespace:nsp-tls-store-pass


12 

Enter the following:

Note: If the NSP cluster VMs do not have the required SSH key, you must include the --ask-pass argument in the command, as shown in the following example, and are subsequently prompted for the root password of each cluster member:

nspk8sctl --ask-pass install

./nspk8sctl install ↵

The Kubernetes software in the cluster is re-installed.


13 

Add the labels from the former node to the replacement node.

  1. Enter the following:

    cd /opt/nsp/NSP-CN-DEP-release-ID/bin ↵

  2. Enter the following:

    ./nspdeployerctl config ↵

  3. On the NSP cluster host, enter the following:

    kubectl get nodes node --show-labels ↵

    where node is the node name

  4. Verify that the labels match the labels recorded in Step 4.


14 

Update the node IP address in the NSP software configuration file.

  1. Open the following file on the NSP deployer VM using a plain-text editor such as vi:

    /opt/nsp/NSP-CN--DEP-release-ID/NSP-CN-release-ID/config/nsp-config.yml

  2. If the former node IP address is present in the file, replace it with the new IP address.

  3. Save and close the file.


15 

Enter the following on the NSP deployer VM:

Note: If the NSP cluster VMs do not have the required SSH key, you must include the --ask-pass argument in the nspdeployerctl command, as shown in the following example, and are subsequently prompted for the root password of each cluster member:

nspdeployerctl --ask-pass install --config

/opt/nsp/NSP-CN-DEP-release-ID/bin/nspdeployerctl install --config ↵

The new node IP address is propagated to the deployment configuration.


16 

Perform one of the following.

Note: If the NSP cluster VMs do not have the required SSH key, you must include the --ask-pass argument in the nspdeployerctl command, as shown in the following example, and are subsequently prompted for the root password of each cluster member:

nspdeployerctl --ask-pass install –-deploy

  1. If the NSP is deployed in a DR configuration, enter the following on the standby NSP deployer VM:

    /opt/nsp/NSP-CN-DEP-release-ID/bin/nspdeployerctl install --deploy ↵

    The NSP starts.

  2. If the NSP system is configured as an enhanced deployment without DR, you must ensure that the PostgreSQL and Neo4j databases in the cluster initialize on an existing node, and not on the replacement node, by cordoning the replacement node until after the initialization.

    1. Enter the following on the NSP cluster host:

      kubectl cordon node

      where node is the node name

      The replacement node is cordoned.

    2. On the NSP deployer VM, enter the following:

      /opt/nsp/NSP-CN-DEP-release-ID/bin/nspdeployerctl install --deploy ↵

    3. On the NSP cluster host, enter the following:

      kubectl get pods -A ↵

      The pods are listed.

    4. View the output; if all of the following are not true, repeat 3.

      Note: You must not proceed to the next step until the conditions are met.

      • One postgres-primary pod instance is in the Running state.

      • At least two nspos-neo4j-core pod instances are in the Running state.

      • At least two nsp-tomcat pod instances are in the Running state.

      • If nrcx-tomcat is installed, at least two nrcx-tomcat pods are in the Running state.

    5. When the conditions are met, enter the following:

      kubectl uncordon node

      The replacement node is uncordoned.

  3. If the node to be replaced is not in a DR or enhanced deployment, skip this step.

    Perform How do I restore the NSP cluster databases? using a copy of the appropriate NSP system backup, which is typically the most recent.

    Note: Do not perform Step 4, which deletes all databases.

    Note: The restore procedure starts the NSP cluster when the restore is complete.


17 

Back up the NSP cluster data, as described in How do I back up the NSP cluster databases?.


18 

Close the open console windows.

End of steps