Restore Etcd of a Cluster Managed by Cluster Templates
This guide shows you how to restore a cluster's etcd to an earlier snapshot. This is useful when you need to revert a cluster to an earlier state.
This tutorial has the following requirements:
The CLI tool
omnictl
must be installed and configured.The cluster which you want to restore must still exist (not be deleted from Omni) and have past backups available.
The cluster must be managed using cluster templates (not via the UI).
Finding the Cluster's UUID
To find the cluster's UUID, run the following command, replacing my-cluster
with the name of your cluster:
The output will look like this:
Note the UUID
column, which contains the cluster's UUID.
Finding the Snapshot to Restore
List the available snapshots for the cluster:
The output will look like this:
The SNAPSHOT
column contains the snapshot name which you will need to restore the cluster. Let's assume you want to restore the cluster to the snapshot FFFFFFFF9A99FBFD.snapshot
.
Deleting the Existing Control Plane
To restore the cluster, we need to first delete the existing control plane of the cluster. This will take the cluster into the non-bootstrapped state. Only then we can create the new control plane with the restored etcd.
Use the following command to delete the control plane, replacing my-cluster
with the name of your cluster:
Creating the Restore Template
Edit your cluster template manifest template-manifest.yaml
, edit the list of control plane machines for your needs, and add the bootstrapSpec
section to the control plane, with cluster UUID and the snapshot name we found above:
Syncing the Template
To sync the template, run the following command:
After the sync, your cluster will be restored to the snapshot you specified.
Restarting Kubelet on Worker Nodes
To ensure a healthy cluster operation, the kubelet needs to be restarted on all worker nodes.
For this step, you need talosctl to be installed and talosconfig to be configured for this cluster. You can download talosconfig using the Web UI or by
Get the IDs of the worker nodes:
The output will look like this:
Gather the IDs in this output, and issue a kubelet restart on them using talosctl
:
Last updated