Upgrade from 3.16.4 to 3.18 failing with CrashLoopBackOff

Hello,
I have had Kubernetes 4 node cluster running with version Calico 3.16 using vmware VMs for months. I recently upgraded Kubernetes to v1.20.5 successfully. When I tried to update Calico using these instructions(xxxxx://docs.projectcalico.org/archive/v3.18/getting-started/clis/calicoctl/install) I saw continuous CrashLoopBackoff conditions. I restored the cluster / vm’s to the original Pre-update-calico state running Kubernetes 1.20.5 Calico 3.16 and everything is working. I am just learning so I may be doing something wrong and certainly would appreciate any help or suggestions.

My initial install used this manifest (kubectl create -f xxxxx://docs.projectcalico.org/manifests/tigera-operator.yaml). And here was my working configuration.

$ cat custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://docs.projectcalico.org/v3.16/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 192.168.1.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
[ec2-user@k8y ~]$ kubectl get pods -n calico-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7487d7f956-nxkk7   1/1     Running   14         142d
calico-node-5gbkk                          1/1     Running   10         142d
calico-node-psjtf                          1/1     Running   13         142d
calico-node-wf928                          1/1     Running   16         142d
calico-node-zl8kz                          1/1     Running   12         142d
calico-typha-9656ff977-5bhkn               1/1     Running   20         142d
calico-typha-9656ff977-dznhv               1/1     Running   1          17h
calico-typha-9656ff977-j74mf               1/1     Running   1          19h
$ kubectl get pods -n kube-system
NAME                          READY   STATUS    RESTARTS   AGE
calicoctl                     1/1     Running   14         142d
coredns-74ff55c5b-687d9       1/1     Running   2          21h
coredns-74ff55c5b-pm676       1/1     Running   2          21h
etcd-k8y                      1/1     Running   3          22h
kube-apiserver-k8y            1/1     Running   3          22h
kube-controller-manager-k8y   1/1     Running   3          22h
kube-proxy-9pk97              1/1     Running   2          22h
kube-proxy-fdx77              1/1     Running   2          22h
kube-proxy-vrjzf              1/1     Running   3          22h
kube-proxy-wkp5l              1/1     Running   3          22h
kube-scheduler-k8y            1/1     Running   3          22h

My failed upgrade attempt used the documentation for “Upgrading an installation that uses an etcd datastore” and Install calicoctl as a Kubernetes pod, but I have no idea if this is the correct process.

$ kubectl apply -f calico-etcd.yaml
secret/calico-etcd-secrets created
configmap/calico-config created
Warning: resource clusterroles/calico-kube-controllers is missing the kubectl.kubernetes.,io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s,io/calico-kube-controllers configured
Warning: resource clusterrolebindings/calico-kube-controllers is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrolebinding.rbac.authorization.k8s,.io/calico-kube-controllers configured
Warning: resource clusterroles/calico-node is missing the kubectl.kubernetes,.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s,io/calico-node configured
Warning: resource clusterrolebindings/calico-node is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrolebinding.rbac.authorization.k8s.,io/calico-node configured
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created

[ec2-user@k8y ~]$ kubectl apply -f canal-etcd.yaml
secret/calico-etcd-secrets configured
configmap/canal-config created
clusterrole.rbac.authorization.k8s,io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.,io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.,io/calico-node unchanged
clusterrole.rbac.authorization.k8s.,io/flannel created
clusterrolebinding.rbac.authorization.k8s.,io/canal-flannel created
clusterrolebinding.rbac.authorization.k8s,.io/canal-calico created
daemonset.apps/canal-node created
serviceaccount/canal-node created
clusterrolebinding.rbac.authorization.k8s,.io/canal created
clusterrole.rbac.authorization.k8s.,io/canal created
deployment.apps/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged
poddisruptionbudget.policy/calico-kube-controllers unchanged
job.batch/configure-canal created

$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-2fh47 0/1 CrashLoopBackOff 7 15m
calico-node-mh492 0/1 CrashLoopBackOff 7 15m
calico-node-s2tjn 0/1 CrashLoopBackOff 7 15m
calico-node-shc9g 0/1 CrashLoopBackOff 6 15m
calicoctl 1/1 Running 17 142d
canal-node-55x4f 1/2 CrashLoopBackOff 11 47m
canal-node-96zlh 1/2 Running 11 47m
canal-node-msqcx 1/2 CrashLoopBackOff 7 47m
canal-node-p6tql 1/2 Running 13 47m
coredns-74ff55c5b-687d9 0/1 Completed 3 23h
coredns-74ff55c5b-pm676 0/1 Completed 3 23h
etcd-k8y 1/1 Running 6 25h
kube-apiserver-k8y 1/1 Running 6 25h
kube-controller-manager-k8y 1/1 Running 6 25h
kube-proxy-9pk97 1/1 Running 4 25h
kube-proxy-fdx77 1/1 Running 4 25h
kube-proxy-vrjzf 1/1 Running 6 25h
kube-proxy-wkp5l 1/1 Running 5 25h
kube-scheduler-k8y 1/1 Running 6 25h

Reading thru github I noticed opened issue for “Missing upgrade instructions for installs using Tigera Operator #4426” with a notation:
"In the meantime, I believe the upgrade procedure (for a system which was installed using tigera-operator already) is just:

kubectl apply -f https://docs.projectcalico.org/archive/v3.18/manifests/tigera-operator.yaml

This ran with errors like:
Warning: resource deployments/tigera-operator is missing the kubectl.kubernetes,io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
deployment.apps/tigera-operator configured

The cluster appears stable, but I am not sure the upgrade is complete.

Is there a separate command to update the client?

$ kubectl exec -ti -n kube-system calicoctl – calicoctl version
Client Version: v3.16.4
Git commit: 51418082
Cluster Version: v3.18.1
Cluster Type: typha,kdd,k8s,operator,bgp,kubeadm

To update the client it was pointed out:
Just delete it and create a new one from the calicoctl install doc in the uplevel version. Its not really an “upgrade”, so much as “throw away the old one and install the new”.

So I ran the following:

 kubectl exec -ti -n kube-system calicoctl -- calicoctl version
Client Version:    v3.16.4
Git commit:        51418082
Cluster Version:   v3.18.1
Cluster Type:      typha,kdd,k8s,operator,bgp,kubeadm

kubectl get pod -n kube-system calicoctl

NAME        READY   STATUS    RESTARTS   AGE
calicoctl   1/1     Running   17         143d

kubectl delete pod -n kube-system calicoctl

pod "calicoctl" deleted

kubectl get pod -n kube-system calicoctl

Error from server (NotFound): pods "calicoctl" not found

kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml

serviceaccount/calicoctl unchanged
pod/calicoctl created
clusterrole.rbac.authorization.k8s.io/calicoctl unchanged
clusterrolebinding.rbac.authorization.k8s.io/calicoctl unchanged

 kubectl exec -ti -n kube-system calicoctl -- calicoctl version

Client Version:    v3.18.1
Git commit:        911f383f
Cluster Version:   v3.18.1
Cluster Type:      typha,kdd,k8s,operator,bgp,kubeadm

Also pointed out:
https://docs.projectcalico.org/manifests/calicoctl.yaml is the one to use when your cluster is using the kubernetes api as a datastore.
This is shown by the `Cluster Type: ...kdd,...` output from calicoctl version.

All good now.

That looks like the right process to me; just apply the new operator manifest over the old one. You could use kubectl replace instead to avoid the error, I think.