Kubernertes Network Policy blocked by firewall

I am running a 4 node K8s cluster locally with vagrant on Centos 7.9 installed with Kubeadm. I’m using Kubernetes version: 1.19.3, Calico CNI: 3.17.1, and Containerd 1.4.3. I do not have Docker engine installed.

I am encountering an issue when using a kubernetes network ingress policy to allow requests between pods using a matching label. The firewall is configured using the information from: Installing kubeadm | Kubernetes and Calicos kubernetes requirements. I am using IP-in-IP encapsulation. I have also tried with vxlan encapsulation and have the same issue.

If the policy is not applied, inter-pod communication works fine with the firewall running. Once the policy is applied the inter-pod communication fails. If I stop the firewall on the host of the requesting pod, communication works. If I stop the firewall on the host of the target pod, communication still fails. I think the policy is correct. I’ve stopped the firewall and did negative testing and positive testing. I read the topic: [Solved] issue with kubernetes network policy and calico CNI - ingress namespaceSelector not working - #3 by Timo as it seemed similar and tried removing the masquerade setting but that made things worse. All inter-pod communication began failing.

The firewall rules on my worker nodes look like this:

[root@k8s-worker-2 ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0 eth1
  sources: 
  services: dhcpv6-client ssh
  ports: 10250/tcp 30000-32767/tcp 30000-32767/udp 53/udp 53/tcp 179/tcp 179/udp 4789/udp 5473/tcp
  protocols: 
  masquerade: yes
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 
	rule protocol value="4" accept

This is how I produce the issue. I start by creating a namespace: kubectl create ns np Then apply the following DaemonSet. This runs an app on each node listening on port 8080 that returns the name of the pod and its version.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  creationTimestamp: null
  labels:
    app: deployment
  name: foo
  namespace: np
spec:
  selector:
    matchLabels:
      app: deployment
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: deployment
    spec:
      containers:
      - image: dgkanatsios/simpleapp
        name: simpleapp
        ports:
        - containerPort: 8080
        resources: {}

Then apply another DaemonSet that puts a netshoot pod on each node. This will be used for sending requests to the hostname pods.

apiVersion: apps/v1
kind: DaemonSet de
metadata:
  name: netshoot-daemonset 
  namespace: np
  labels:    
    app: netshoot
spec:
  selector:    
    matchLabels:
      name: netshoot-daemonset
  template:    
    metadata:      
      labels:
        name: netshoot-daemonset
    spec:      
      containers:
      - command: ["tail"]
        args: ["-f", "/dev/null"]
        image: nicolaka/netshoot
        name: netshoot-pod

Then run kubectl -n np get po -owide

NAME                       READY   STATUS    RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
foo-8s8mc                  1/1     Running   0          9m1s    172.16.140.1    k8s-worker-2   <none>           <none>
foo-m24vf                  1/1     Running   0          9m1s    172.16.182.65   k8s-master-2   <none>           <none>
foo-v62mm                  1/1     Running   0          9m1s    172.16.230.2    k8s-worker-1   <none>           <none>
netshoot-daemonset-c86d5   1/1     Running   0          8m55s   172.16.182.66   k8s-master-2   <none>           <none>
netshoot-daemonset-ftnjx   1/1     Running   0          8m55s   172.16.140.2    k8s-worker-2   <none>           <none>
netshoot-daemonset-qzcsn   1/1     Running   0          8m55s   172.16.230.3    k8s-worker-1   <none>           <none>

To send a request from the netshoot pod on k8s-worker-1 to the foo pod running on k8s-worker-2, I use: k exec -it netshoot-daemonset-qzcsn -- wget -qO- 172.16.140.1:8080 --timeout=2

With the firewall started but no network policy I get this result:

Hello world from foo-ksjzp and version 2.0

I tested communication between netshoot pods running on each node to each foo pod in a similar fashion. No issues with any communication. So I apply the ingress policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress
  namespace: np
spec:
  podSelector:
    matchLabels:
      app: "deployment"
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          name: netshoot-daemonset
    ports:
    - protocol: TCP
      port: 8080 

When I try the netshoot command again, it times out. I tried different timeouts (up to 10 seconds).

[root@k8s-master-1 network-policy]# k exec -it netshoot-daemonset-qzcsn -- wget -qO- 172.16.140.1:8080 --timeout=2
wget: download timed out
command terminated with exit code 1

I then stopped the firewalld service on k8s-worker-1, where the netshoot-daemonset-pod is running. When I execute the command again, it succeeds. I then restarted firewalld on k8s-worker-1 and stopped it on k8s-worker-2, where the foo pod is running. When I execute the command again it succeeds. When I delete the network policy the requests work again with both firewalls started.

With the network policy removed and the firewalls running on both nodes I started investigating the traffic between the pods. On K8s-worker-2 I used crictl pods and crictl inspectp to find the calico interface the foo pod uses. I then used tcpdump on that interface and on the tunl0 interface. With the firewalls started and the policy removed, I see traffic on the tunl0 interface and into the pod’s interface. After applying the policy I see traffic on tunl0 but not on the pod’s interface. The traffic on tunl0 looks like this:

[root@k8s-worker-2 ~]# tcpdump -i tunl0 -vv -nn
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
13:56:59.538718 IP (tos 0x0, ttl 63, id 5051, offset 0, flags [DF], proto TCP (6), length 60)
    172.16.140.0.59224 > 172.16.230.9.8080: Flags [S], cksum 0xdef3 (correct), seq 11114561, win 28800, options [mss 1440,sackOK,TS val 44534778 ecr 0,nop,wscale 7], length 0
13:57:00.540011 IP (tos 0x0, ttl 63, id 5052, offset 0, flags [DF], proto TCP (6), length 60)
    172.16.140.0.59224 > 172.16.230.9.8080: Flags [S], cksum 0xdb09 (correct), seq 11114561, win 28800, options [mss 1440,sackOK,TS val 44535780 ecr 0,nop,wscale 7], length 0

This is as far as I’ve gotten. I’m not proficient yet at debugging iptables and I’m hoping I have something just misconfigured. I do have another older 4 node cluster on Centos running Kubernetes 1.18.1, Calico CNI: 3.15.1, and Docker Engine 19.03. The communication works on that cluster with the firewalls running and policy applied. I’ve compared settings between my two clusters and I have not found a difference in the settings or firewall setup.

I have ran diagnostics using calicoctl, but I don’t know what I am looking for in the diagnostic files. I can make them available. Any advice, ideas, or support would be greatly appreciated. Thank you.

As you probably know, both firewalld and kubelet modify iptables rules. To ensure that firewall is not causing the issue I suggest the following:

  1. Disable firewalld
  2. Stop kubelet
  3. flush iptables
  4. Start kubelet

This way you should end up with a clean ruleset. Repeat your tests.
I hope this helps already. Maybe you give it a shot without the masquerading again.
In my case it the network policies worked without pod or namespace selector because the source ip filtering was causing the issue.

Best regards,
Timo

Thank you Timo for your response. I tried your suggestions but they didn’t solve the issue. To be clear I did:

systemctl stop firewalld
systemctl stop kubelet.service
iptables -F
systemctl start kubelet.service

Then with firewalld stopped I did some testing with the policy - all good. I then restarted firewalld and received the timeout. I then removed the masquerade setting and reloaded firewalld. When I try any request (with or without the policy) I get this:

k exec -it netshoot-daemonset-qzcsn -- wget -qO- 172.16.140.6:8080 --timeout=2
wget: can't connect to remote host (172.16.140.6): Host is unreachable

So I added the masquerade back, disabled firewalld, kubelet, flushed iptables, and restarted the kubelet. I then stepped back and tried different policies. I tried an allow all ingress policy (no pod or to/from selectors). That worked fine with the firewall enabled. I then tried an ingress policy with just a podSelector. That worked fine as well with the firewall enabled. I then labeled my namespace and tried a policy using a nameSpaceSelector. That failed with the firewalls started.

Here is the policy that worked (just a podSelector):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-ingress
  namespace: np
spec:
  podSelector:
    matchLabels:
      app: "deployment"
  ingress:
  - {}
  policyTypes:
  - Ingress

Here is the policy with the nameSpaceSelector

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-ingress
  namespace: np
spec:
  podSelector:
    matchLabels:
      app: "deployment"
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          team: operations
  policyTypes:
  - Ingress

I double checked that the namespace is labeled:

NAME   STATUS   AGE   LABELS
np     Active   24h   team=operations

I’m going to keep digging but if you (or anyone) thinks of something else to try please reply.

Cheers,
Andy

The thing is, Calico network policy is a firewall. You’re effectively running 2 firewalls at once. If either of them block the traffic, the traffic isn’t going to get through. Also in some cases, they may “fight” (i.e. try to remove each other’s iptables rules)

You either need to:

  • disable the firewall completely
  • configure it to allow all the traffic that you might want to allow in your cluster - i.e. allow all protocols, on all ports from the list of IPs that your nodes have and the pod-cidr.

If you disable the firewall completely, it is possible to use Calico to protect the hosts too. See Policy for hosts

Hi Andy,

if I understand correctly, the network policies worked without firewalld, including pod and namespace selectors (as part of ingress or egress rules).
As lwr20 explained it already, both kubelet/calico and firewalld configured iptables. That is one source of your potential issues: Always restart both kubelet after reconfiguring firewalld, in that order!

If network policies work without pod or namespace selector in rules, then it is very likely related to source IPs. Firewalld masks the source IP when masquerade is enabled.
We actually ended up adding all nodes of the cluster as sources in a trusted zone of firewalld to avoid having port and other issues. That also leads to very few port rules on the public zone.

It would be great if you could confirm the first assumption.
Thanks,
Timo

Hi Timo,
Yes you’re assumption is correct. I’ve done more testing and I am sure its the masquerade causing the problem. I’ve worked with my network team and they suggested running without firewalld on the servers, let Kubernetes/calico handle its internal firewalls (network policies) and they will protect from the outside. So I’m good now. Thank you so much to everyone for their help.
Andy