Migrating a cluster from flannel/canal to Calico

Hi all,

I’m very new to Calico and was hoping to get some help with errors I’m encountering with migrating a cluster from flannel networking to Calico.

Cluster creation:

  1. Followed Rancher quickstart guide, using Terraform to create an AWS cluster here. By default, networking provider is canal v3.13.
  2. Deployed and exposed nginx workload, and accessed from another pod.

Migration:
3. Followed migration doc. After installing the migration controller, I’m seeing an error that the remove-flannel pod could not be found.

Logs:

I’d appreciate any insight as to why this error is occurring and how to resolve it. Thanks in advance!

Flannel migration controller tries to run a flannel cleanup pod on each node. It seems the cleanup pod failed to start for some reason. Is this happening on the very first node to migrate? Or there are nodes migrated succesfully?

Can you watch pod in kube-system namespace and see if the cleanup pod did come up?

Hello,

It looks like it’s getting through quite a bit of the migration process before failing to find a pod. Please see attached complete set of logs and a screenshot of cluster events for more info.

Logs: https://pastebin.com/GZzR1twA
Cluster events:

Thanks!

It appears that a flannel-migration pod ran to completion and was deleted before flannel-migration-controller could clean it up. I’m not sure what’s happening to cause that, though.

Sorry for late reply. Could you start a new cluster and apply yaml below on one of your nodes? This is same as running a remove-flannel pod but give us chance to evaluate what could go wrong.

Please put in your node name and collect logs for this pod. Thanks!

apiVersion: v1
kind: Pod
metadata:
  labels:
    flannel-migraion: node
  name: remove-flannel
  namespace: kube-system
spec:
  hostNetwork: true
  restartPolicy: Never
  containers:
  - name: remove-flannel
    image: calico/node:v3.13.2
    command: ["/bin/sh", "-c"]
    args:
    - |
      ip link show flannel.1 || exit 0 && { echo '{ "name": "dummy", "plugins": [{ "type": "flannel-migration-in-progress" }]}' > /host//etc/cni/net.d/10-calico.conflist ; ip link delete flannel.1; } && exit 0 || exit 1
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /host//etc/cni/net.d
      name: host-dir
  volumes:
  - name: host-dir
    hostPath:
      path: /etc/cni/net.d
  nodeName: <your node name>

Running it on a worker node, the logs only show:

kubectl logs pod/remove-flannel -n kube-system
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 7e:16:ac:92:bb:53 brd ff:ff:ff:ff:ff:ff

If you are using rancher and Canal you will run into troubles following migration tutorial. Normal attempt to migrate from Canal to Calico will result in below error message which render migration unsuccessful.

In order to migrate to Calico first install Canal by using yaml file in documentation section. Remember to change “Network” CIDR from its default value to your network CIDR and everything will go smoothly.

net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }

Things to consider:

  • this works with v1.16.6
  • you at least need 3 nodes for migration to work, you can checkout my terraform changes here.
  • if you are not sure what is your cluster CIDR ssh into one of the nodes and run
    ps -ef | grep cluster-cidr