MicroK8s: Create a HA cluster
This tutorial assumes you already have a MicroK8s cluster with one node running, if you don't check how to set up a MicroK8s cluster. Once we have a MicroK8s cluster, we can add more nodes to make it highly available. Adding a node To add a new node to our cluster, we should obtain the microk8s join command from the main node running the control plane: microk8s add-node Once we've kept safe the output of this command, we should access the node we want to add to the cluster. The new node should have MicroK8s installed, if it's not already installed you should run the following command: sudo snap install microk8s --classic The --channel parameter should be specified if the main node is not running the latest MicroK8s version as all the nodes should be running the same versions. Once our node has MicroK8s installed, we should run the microk8s join command we've got from the main node. microk8s join :/ The node can join the cluster as a worker node, these are ideal for low-end devices as they consume fewer resources. They also make sense in large clusters with enough control plane nodes to ensure HA. Use the flag --worker to join the cluster as a worker node. A few seconds later we should be able to see the node has joined the cluster: microk8s kubectl get nodes NAME STATUS ROLES AGE VERSION eu-central-1.binarycomet.net Ready 6h46m v1.32.3 eu-central-2.binarycomet.net Ready 53s v1.32.3 We should run these steps on the nodes we want to be part of the cluster. Set failure domains To make MicroK8s failure domain aware we need to set an integer to each failure domain at /var/snap/microk8s/current/args/ha-conf. echo "failure-domain=1" > /var/snap/microk8s/current/args/ha-conf After this change we should restart the node: microk8s stop microk8s start What's a failure domain? A failure domain is the node distribution across availability zones or other failure boundaries. For example, if we have a MicroK8s cluster with 6 nodes spread across 3 different data centers. Each data center represents a separate failure domain, so if there's a power or network outage in one datacenter, both nodes would go down, but the nodes in the other 2 data centers would be unaffected. # On data center 1 echo "failure-domain=1" > /var/snap/microk8s/current/args/ha-conf # On data center 2 echo "failure-domain=2" > /var/snap/microk8s/current/args/ha-conf # On data center 3 echo "failure-domain=3" > /var/snap/microk8s/current/args/ha-conf With these values MicroK8s will try to distribute the replicas across different failure domains making the cluster fault tolerant. Storage If you're using the hostpath storage add-on, keep in mind it will only be available on the nodes you have enabled it. For a high-available cluster you should set up an alternative storage like NFS or Ceph. High Availability If your cluster consists of three or more nodes, the datastore will be replicated across the nodes and will be resilient to a single failure, so if a node goes down the cluster will continue to run without interruption. We can check if our cluster is highly available: microk8s status microk8s is running high-availability: yes datastore master nodes: 10.128.63.86:19001 10.128.63.166:19001 10.128.63.43:19001 datastore standby nodes: none
This tutorial assumes you already have a MicroK8s cluster with one node running, if you don't check how to set up a MicroK8s cluster. Once we have a MicroK8s cluster, we can add more nodes to make it highly available.
Adding a node
To add a new node to our cluster, we should obtain the microk8s join
command from the main node running the control plane:
microk8s add-node
Once we've kept safe the output of this command, we should access the node we want to add to the cluster.
The new node should have MicroK8s installed, if it's not already installed you should run the following command:
sudo snap install microk8s --classic
The
--channel
parameter should be specified if the main node is not running the latest MicroK8s version as all the nodes should be running the same versions.
Once our node has MicroK8s installed, we should run the microk8s join
command we've got from the main node.
microk8s join :/
The node can join the cluster as a worker node, these are ideal for low-end devices as they consume fewer resources. They also make sense in large clusters with enough control plane nodes to ensure HA.
Use the flag
--worker
to join the cluster as a worker node.
A few seconds later we should be able to see the node has joined the cluster:
microk8s kubectl get nodes
NAME STATUS ROLES AGE VERSION
eu-central-1.binarycomet.net Ready 6h46m v1.32.3
eu-central-2.binarycomet.net Ready 53s v1.32.3
We should run these steps on the nodes we want to be part of the cluster.
Set failure domains
To make MicroK8s failure domain aware we need to set an integer to each failure domain at /var/snap/microk8s/current/args/ha-conf
.
echo "failure-domain=1" > /var/snap/microk8s/current/args/ha-conf
After this change we should restart the node:
microk8s stop
microk8s start
What's a failure domain?
A failure domain is the node distribution across availability zones or other failure boundaries.
For example, if we have a MicroK8s cluster with 6 nodes spread across 3 different data centers.
Each data center represents a separate failure domain, so if there's a power or network outage in one datacenter, both nodes would go down, but the nodes in the other 2 data centers would be unaffected.
# On data center 1
echo "failure-domain=1" > /var/snap/microk8s/current/args/ha-conf
# On data center 2
echo "failure-domain=2" > /var/snap/microk8s/current/args/ha-conf
# On data center 3
echo "failure-domain=3" > /var/snap/microk8s/current/args/ha-conf
With these values MicroK8s will try to distribute the replicas across different failure domains making the cluster fault tolerant.
Storage
If you're using the hostpath storage add-on, keep in mind it will only be available on the nodes you have enabled it. For a high-available cluster you should set up an alternative storage like NFS or Ceph.
High Availability
If your cluster consists of three or more nodes, the datastore will be replicated across the nodes and will be resilient to a single failure, so if a node goes down the cluster will continue to run without interruption.
We can check if our cluster is highly available:
microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 10.128.63.86:19001 10.128.63.166:19001 10.128.63.43:19001
datastore standby nodes: none