I have this situation where a certain K8S network policy is not working for me:
No policy → connection across nodes is working
When I set only port ingress filter → connection across nodes is working
However, when adding a namespaceSelector to the same policy-> only same node traffic is working (e.g. node2->node2, but not node2->node3)
I can see that the connection remains in “SYN_SENT” state.
I can see that iptables is populated, but couldn’t tell if there are issues.
I have two scenarios with pod to pod traffic:
a) Prometheus
b) Kong API gateway
both exhibit the same behavior where same node targets are working, but different node targets timeout.
I am using the default install for a small on-prem cluster as advised by Kubeadm documentation.
images: calico/kube-controllers:v3.16.1 and calico/node:v3.16.1
Is this a known issue? I couldn’t find anything related.
I’m afraid I don’t have a full answer for you yet, but 3 things spring to mind.
Can you tell us more about how your nodes are connected, and if you’re using an overlay (vxlan/ip-in-ip) or not?
Can you share an example of NetworkPolicy that stops the inter-node communication when you add a namespaceSelector?
Have you tried running ‘sudo watch iptables-save -c’ on the source and destination nodes, while trying to establish an inter-node connection? If it’s iptables dropping the packet, an increasing counter should indicate which rule is responsible for that.
thanks for giving me the right hints. After looking at the diffs between packet counts and before/after applying the namespaceSelector, it turns out that the sourceIP ipset rule is not firing.
Not sure why this is the case, but after disabling firewalld (I am using Centos7 with ip-in-ip), it started working.
I tried adding rules to the firewall to allow IP tunneling, but that doesn’t seem to be the issue. Here’s what I ended up with:
Modifying the firewall rules as outlined above did not work unfortunately.
Suggestions how to configure firewalld properly for calico ip-in-ip are greatly appreciated.
It’s good that it works after disabling firewalld, but I also don’t yet understand in detail why that would be. If firewalld was the problem, it should have affected the inter-node communication before you added the namespaceSelector to your NetworkPolicy, as well as after.
Anyway, focussing first on what you need to allow through firewalld, please see:
Regarding the namespaceSelector: without that being specified, the allowed from peers are all pods in the helloworld-netcore-master namespace. With the namespaceSelector as above, the allowed peers are all pods in namespaces with the nbly-role: ingress-kong label. Does that help at all, in your setup?
Finally you mentioned “the sourceIP ipset rule is not firing” - I’m not sure I understand; can you show me more precisely what you mean?
yes, I have added all the ports mentioned in the documentation. However, I am unclear about this specific requirement for ip-in-ip:
"IP-in-IP, often represented by its protocol number 4"
I tried adding a firewalld rule for this, but not sure if that was correct (see above).
After some more digging I suspected that there was some source NAT going on. Since we are using service type clusterIP, it must have been the firewalld. I removed the masquerade=yes and rebooted the machines. Voila, it is working now.
The IP-in-IP thing is that it’s not a port number like with TCP-based or UDP-based protocols; it’s one higher-level than that. The iptables match would be --protocol 4. Don’t know if there’s a firewalld equivalent, but probably there is.