Migrating HA Kubernetes Cluster from Rocky Linux 8 to Rocky Linux 9
출처: https://www.lisenet.com/2023/migrating-ha-kubernetes-cluster-from-rocky-linux-8-to-rocky-linux-9/
The Upgrade Plan
We are going to upgrade our Kubernetes homelab nodes from Rocky 8 to Rocky 9.
We have a cluster of six nodes, three control planes and three worker nodes, all of which are KVM guests running Rocky 8.

We will upgrade the control plane nodes first, one at a time using Packer images and Ansible playbooks, and then upgrade the worker nodes, also one at a time, using the same approach.
This is a lengthy but zero-downtime process, and does not require re-building the Kubernetes cluster from scratch. Note that will not be upgrading Kubernetes version.
Software version before the upgrade:
- Rocky 8
- Containerd 1.6
- Kubernetes 1.26
- Calico 3.25
- Istio 1.17
Software versions after the upgrade:
- Rocky 9
- Containerd 1.6
- Kubernetes 1.26
- Calico 3.25
- Istio 1.17
SELinux is set to enforcing mode.
Configuration Files
For Packer setup, see Github repository here.
For Ansible playbooks, see GitHub repository here.
Cluster Information
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srv31 Ready control-plane 347d v1.26.4 10.11.1.31 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv32 Ready control-plane 347d v1.26.4 10.11.1.32 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv33 Ready control-plane 477d v1.26.4 10.11.1.33 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv35 Ready none 347d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
Build a Rocky 9 KVM Image with Packer
First of all, we need to build Rocky 9 KVM image using Packer.
$ git clone https://github.com/lisenet/kubernetes-homelab.git
$ cd ./packer
$ PACKER_LOG=1 packer build ./rocky9.json
Upgrade the First Control Plane Node
We will start with srv31.
Drain and Delete Control Plane from Kubernetes Cluster
Drain and delete the control plane from the cluster:
$ kubectl drain srv31
$ kubectl delete node srv31
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
srv32 Ready control-plane 347d v1.26.4
srv33 Ready control-plane 477d v1.26.4
srv34 Ready none 477d v1.26.4
srv35 Ready none 347d v1.26.4
srv36 Ready none 477d v1.26.4
The cluster will remain operational as long as the other two control planes are online.

Delete Control Plane from Etcd Cluster
Etcd will have a record of all three control plane nodes. We therefore have to delete the control plane node from the Etcd cluster too.
$ kubectl get pods -n kube-system -l component=etcd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
etcd-srv32 1/1 Running 4 (2d ago) 20d 10.11.1.32 srv32 none none
etcd-srv33 1/1 Running 4 (2d ago) 20d 10.11.1.33 srv33 none none
Query the cluster for the Etcd members:
$ kubectl exec etcd-srv32 \
-n kube-system -- etcdctl \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
member list
c36952e9f5bf4f49, started, srv33, https://10.11.1.33:2380, https://10.11.1.33:2379, false
c44657d8f6e7dea5, started, srv31, https://10.11.1.31:2380, https://10.11.1.31:2379, false
e279a8288f4be237, started, srv32, https://10.11.1.32:2380, https://10.11.1.32:2379, false
Delete the member for control plane srv31:
$ kubectl exec etcd-srv32 \
-n kube-system -- etcdctl \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
member remove c44657d8f6e7dea5
Member c44657d8f6e7dea5 removed from cluster 53e3f96426ba03f3
Delete Control Plane KVM Guest
SSH into the hypervisor where the control plane server is running, and stop the VM:
$ ssh root@kvm1.hl.test "virsh destroy srv31-master"
Domain 'srv31-master' destroyed
Delete the current KVM snapshot (it’s the one from the previous Kubernetes upgrade):
$ ssh root@kvm1.hl.test "virsh snapshot-delete srv31-master --current"
Delete the control plane server image, including its storage:
$ ssh root@kvm1.hl.test "virsh undefine srv31-master --remove-all-storage"
Domain srv31-master has been undefined
Volume 'vda'(/var/lib/libvirt/images/srv31.qcow2) removed.
Create a Rocky Linux Control Plane KVM Guest
Copy Rocky 9 image that was built with Packer to the hypervisor for srv31:
$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 root@kvm1.hl.test:/var/lib/libvirt/images/srv31.qcow2
Provision a new srv31 control plane KVM guest:
$ virt-install \
--connect qemu+ssh://root@kvm1.hl.test/system \
--name srv31-master \
--network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:31 \
--disk path=/var/lib/libvirt/images/srv31.qcow2,size=32 \
--import \
--ram 4096 \
--vcpus 2 \
--os-type linux \
--os-variant centos8 \
--sound none \
--rng /dev/urandom \
--virt-type kvm \
--wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment.
$ git clone https://github.com/lisenet/homelab-ansible.git
$ cd ./homelab-ansible
$ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub root@srv31.hl.test
$ ansible-playbook ./playbooks/configure-k8s-hosts.yml
Prepare Kubernetes Cluster for Control Plane Node to Join
SSH into a working control plane node, srv32, and re-upload certificates:
$ ssh root@srv32.hl.test "kubeadm init phase upload-certs --upload-certs"
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78
Print the join command on the same control plane node:
$ ssh root@srv32.hl.test "kubeadm token create --print-join-command"
kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1
SSH into the newly created control plane srv31 and join the Kubernetes cluster:
$ ssh root@srv31.hl.test \
"kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \
--discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1 \
--control-plane \
--certificate-key d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78"
Restart kubelet on srv31:
$ ssh root@srv31.hl.test "systemctl restart kubelet"
Check cluster status:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srv31 Ready control-plane 11m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv32 Ready control-plane 347d v1.26.4 10.11.1.32 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv33 Ready control-plane 477d v1.26.4 10.11.1.33 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
We have our very first control plane running on Rocky 9.
Repeat the process for the other two control planes, srv32 and srv33.
Do not proceed further until you upgrade all control planes:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srv31 Ready control-plane 89m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv32 Ready control-plane 32m v1.26.4 10.11.1.32 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv33 Ready control-plane 52s v1.26.4 10.11.1.33 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
$ kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-57b57c56f-l9ps5 1/1 Running 0 24m 192.168.134.2 srv31 none none
calico-node-4x79f 1/1 Running 0 164m 10.11.1.31 srv31 none none
calico-node-54c25 1/1 Running 0 29m 10.11.1.32 srv32 none none
calico-node-7fmzb 1/1 Running 1 9d 10.11.1.36 srv36 none none
calico-node-hvh28 1/1 Running 0 4m39s 10.11.1.33 srv33 none none
calico-node-p5vkt 1/1 Running 1 9d 10.11.1.35 srv35 none none
calico-node-stfm6 1/1 Running 1 9d 10.11.1.34 srv34 none none
coredns-787d4945fb-9dq4q 1/1 Running 0 110m 192.168.134.1 srv31 none none
coredns-787d4945fb-k67rx 1/1 Running 0 24m 192.168.134.3 srv31 none none
etcd-srv31 1/1 Running 0 157m 10.11.1.31 srv31 none none
etcd-srv32 1/1 Running 0 26m 10.11.1.32 srv32 none none
etcd-srv33 1/1 Running 0 4m36s 10.11.1.33 srv33 none none
kube-apiserver-srv31 1/1 Running 6 164m 10.11.1.31 srv31 none none
kube-apiserver-srv32 1/1 Running 4 29m 10.11.1.32 srv32 none none
kube-apiserver-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none
kube-controller-manager-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none
kube-controller-manager-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none
kube-controller-manager-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none
kube-proxy-5d25q 1/1 Running 0 4m39s 10.11.1.33 srv33 none none
kube-proxy-bpbrc 1/1 Running 0 29m 10.11.1.32 srv32 none none
kube-proxy-ltssd 1/1 Running 1 9d 10.11.1.36 srv36 none none
kube-proxy-rqmk6 1/1 Running 0 164m 10.11.1.31 srv31 none none
kube-proxy-z9wg2 1/1 Running 2 9d 10.11.1.35 srv35 none none
kube-proxy-zkj8c 1/1 Running 1 9d 10.11.1.34 srv34 none none
kube-scheduler-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none
kube-scheduler-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none
kube-scheduler-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none
metrics-server-77dff74649-lkhll 1/1 Running 0 146m 192.168.135.194 srv34 none none
Upgrade Worker Nodes
We will start with srv34.
Drain and Delete Worker Node from Kubernetes Cluster
$ kubectl drain srv34 --delete-emptydir-data --ignore-daemonsets
$ kubectl delete node srv34
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
srv31 Ready control-plane 89m v1.26.4
srv32 Ready control-plane 32m v1.26.4
srv33 Ready control-plane 52s v1.26.4
srv35 Ready none 348d v1.26.4
srv36 Ready none 477d v1.26.4
Stop the server:
$ ssh root@kvm1.hl.test "virsh destroy srv34-node"
Domain srv34-node destroyed
Delete the current snapshot:
$ ssh root@kvm1.hl.test "virsh snapshot-delete srv34-node --current"
Delete the server, including its storage:
$ ssh root@kvm1.hl.test "virsh undefine srv34-node --remove-all-storage"
Domain srv34-node has been undefined
Volume 'vda'(/var/lib/libvirt/images/srv34.qcow2) removed.
Create a Rocky Linux Worker Node KVM Guest
Copy Rocky 9 image that was built with Packer to the hypervisor for srv34:
$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 root@kvm1.hl.test:/var/lib/libvirt/images/srv34.qcow2
Provision a new srv34 worker node KVM guest:
$ virt-install \
--connect qemu+ssh://root@kvm1.hl.test/system \
--name srv34-node \
--network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:34 \
--disk path=/var/lib/libvirt/images/srv34.qcow2,size=32 \
--import \
--ram 8192 \
--vcpus 4 \
--os-type linux \
--os-variant centos8 \
--sound none \
--rng /dev/urandom \
--virt-type kvm \
--wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment:
$ cd ./homelab-ansible
$ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub root@srv34.hl.test
$ ansible-playbook ./playbooks/configure-k8s-hosts.yml
SSH into the newly created worker node srv34 and join the Kubernetes cluster:
$ ssh root@srv34.hl.test \
"kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \
--discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1 "
Restart kubelet on srv34:
$ ssh root@srv34.hl.test "systemctl restart kubelet"
Check cluster status:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srv31 Ready control-plane 109m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv32 Ready control-plane 52m v1.26.4 10.11.1.32 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv33 Ready control-plane 21m v1.26.4 10.11.1.33 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv34 Ready none 38s v1.26.4 10.11.1.34 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20
srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
Repeat the process for the other two worker nodes, srv35 and srv36.
The end result should be all nodes running Rocky 9:
$ kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,KERNEL:.status.nodeInfo.kernelVersion
NAME VERSION OS-IMAGE KERNEL
srv31 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
srv32 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
srv33 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
srv34 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
srv35 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
srv36 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64
This entry was posted in Kubernetes and tagged homelab, Kubernetes, Rocky Linux. Bookmark the permalink. If you notice any errors, please contact us.