Container Failed to start

scott-hiemstra · February 22, 2021, 7:16pm

Thanks in advance to any who can possibly assist. I’ve been learning quite a bit about Calico over the past couple of weeks, please excuse a new user and community member. I am running into an issue with Calico which has been running on a kubernetes cluster for a long time. Seemingly with no issues which I am aware of. The issue is the calico-kube-controllers deployment does not start. Enabling debugging didn’t really provide any additional information, here are some logs I was able to collect.

2021-02-22T18:38:45.643542239Z 2021-02-22 18:38:45.642 [INFO][1] main.go 75: Loaded 
configuration from environment config=&config.Config{LogLevel:"debug", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"policy,namespace,serviceaccount,workloadendpoint,node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true}
2021-02-22T18:38:45.643593132Z 2021-02-22 18:38:45.643 [DEBUG][1] load.go 70: Loading config from environment
2021-02-22T18:38:45.643598551Z 2021-02-22 18:38:45.643 [DEBUG][1] client.go 30: Using datastore type 'etcdv3'
2021-02-22T18:38:55.643928454Z 2021-02-22 18:38:55.643 [FATAL][1] main.go 87: Failed to start error=failed to build Calico client: context deadline exceeded

scott-hiemstra · February 22, 2021, 7:20pm

Can anyone point me in a direction where I can obtain more information about what might be failing?

fasaxc · February 23, 2021, 1:43pm

It looks like your Calico installation is using the etcd datastore driver and it is failing to connect to etcd.

Was this always an etcd-based cluster? (If not, you may have accidentally installed an etcd driver manifest instead of a kubernetes one).
Is the etcd configuration correct; I believe this is in the calico-config config map in the kube-system namespace? Perhaps you’re running on a fabric with dynamic IPs and etcd was restarted with new IP.
Is etcd running and healthy? Have you upgraded or changes your etcd in some way recently?
Are you using Calico host endpoints (host protection policy); it’s possible to accidentally cut the connection to the datastore if so?

scott-hiemstra · February 23, 2021, 2:00pm

Thanks @fasaxc. Yes, it was always etcd based, I have no reason to believe the etcd configuration is not correct. We’re validating your other questions and points now.

scott-hiemstra · February 24, 2021, 2:51am

I did find digging through logs that etcd did have a hiccup the other day which is when this started happening. The etcd nodes seemed to lose their interconnectivity and after a minute or so they re-established themselves. They all seem happy now and they all seem like they are communicating ok.

calico-node is also up on all of the worker nodes and seems to be running fine. The only issue is calico-kube-controllers, I still haven’t figured out what is stopping calico-kube-controllers from communicating with etcd. Still digging.

scott-hiemstra · March 11, 2021, 4:09am

@fasaxc - Sorry for the delay in getting back to you. I appreciate your previous assistance, it was indeed a communication issue between calico and etcd, your words led me down the right road. An etcd-ca signed client certificate for calico had expired which was causing issues. Certificates were renewed, old nodes were culled and everything is happy once again.

Topic		Replies	Views
Calico-kube-controller and calico-node failed to start Kubernetes	1	471	March 28, 2022
Upgrade from 3.16.4 to 3.18 failing with CrashLoopBackOff Open Source Calico Help	3	2312	March 31, 2021
Issues with Helm chart installation of calico in EKS EKS-AKS-GKE-IKS community , eks , aws	1	445	December 14, 2022
K3s quickstart instructions not working (cni plugin not initialized) Kubernetes community	2	668	December 14, 2022
Calico is not exposing services on self managed k8s cluster on aws using elb as VIP Open Source Calico Help aws	4	625	April 13, 2021

Container Failed to start

Related Topics