2 Node K8S , Metallb, pihole service externalTrafficPolicy #233

quotidian-ennui · 2022-06-27T09:20:53Z

quotidian-ennui
Jun 27, 2022

I'd like to ask a question as to whether I'm doing the right thing or even if the right thing matters in this instance, provided it works. I couldn't find any similar discussions using externalTrafficPolicy as the search term so here we are. I'm using the helm chart version pihole-2.9.0 / appVersion = 2022.05

My confusion here lies in the default values for externalTrafficPolicy; Local has the side-effect of not losing the source IP. Cluster seems more correct because we're already trying to run in kubernetes (the implication of that being, you're going to end up with >1 node at some point).

I have a 2 node K8S cluster, metallb running in L2 mode, nothing actually edgy. I had the problem where the pihole would deploy onto node2, but no DNS traffic would be routed to it. I'm using nginx as the ingress controller.

In almost all situations node1 is the metallb leader; node2 is not.

This is the configuration that I ended up with, in order to make the pihole deploy well enough to work in my setup on either node.

serviceDns:
  loadBalancerIP: 192.168.0.99
  annotations:
    metallb.universe.tf/allow-shared-ip: pihole-svc
  type: LoadBalancer
  externalTrafficPolicy: Cluster
  mixedService: true

serviceWeb:
  http:
    enabled: true
  https:
    enabled: false
  type: ClusterIP
  externalTrafficPolicy: Cluster

ingress:
  enabled: true
  # It's a microk8s cluster, this would be nginx normally I think.
  ingressClassName: "public"
  hosts:
    - dns.quotidian-ennui.local

As far as I can work out, metallb running in L2 mode means that the leader gets all the traffic for 192.168.0.99 and then it has to decide where it goes; which according to my limited brain goes something like this.

if externalTrafficPolicy==Local && pihole is deployed on node1. Metallb leader gets the request and does know what to do with it, so nslookup requests work from the client machines.
if externalTrafficPolicy==Local && pihole is deployed on node2. Metallb leader gets the request and doesn't know what to do with it, so nslookup requests timeout.
if externalTrafficPolicy == Cluster and you're running with 2 separate services (i.e. mixedService == false) then you get a few things stuck in a pending state when you do kubectl get svc -A
- If pihole is deployed with a mixedService && it's on node2, then things work but you lose the sourceIP because that's a consequence of that externalTrafficPolicy

Consequences

I'm losing semi-useful stats out of the admin web console, since changing the externalTrafficPolicy to Cluster the IP Address that's doing the lookup is always the node itself (either node1 or node2)
- This might present a problem in the future since my S.O. actually does have to use shitty services like mailchimp for work which would normally be blocked and tracking down why some js-laden crapola website isn't working isn't made easier when you don't know the source IP anymore.

kenlasko · 2022-11-15T20:58:53Z

kenlasko
Nov 15, 2022

I just wanted to backup @quotidian-ennui's experience with my own.

I came across this after fighting with Pihole for a couple of days with my new K3S installation. I'm new to Kubernetes, and am still very much in the early learning phase. My setup is a 3-node K3s cluster running on two RPis and an Intel NUC. I'm using MetalLB for load-balancing, Traefik for ingress and Longhorn for shared storage (I think those are the right terms). I used Mojo2600's Helm chart for deploying Pihole (of course, or else why would I be here?). All my HTTPS traffic for the UIs for Traefik, Portainer , Longhorn and Pihole are routing to their respective services as expected.

However, Pihole would only respond to DNS on certain machines. After much digging and reading, I think I've figured out why it was so intermittent. When a device queries for an IP address that is used by MetalLB for loadbalancing, any one of the nodes will respond with the MAC address of that node's network card via ARP. This node will then be used by that device for all communications. With Pihole, it would only respond to DNS queries if the selected node happened to be hosting the Pihole pod.

I verified by doing the following (did this from Windows):

Ran arp -a to see the MAC addresses currently mapped to all the used IP addresses.
Noted which MAC address shows for the load balancer IP used by Pihole. You should also be able to see the "real" IP address of the underlying host.
Used Portainer to see which host is currently running the Pihole pod
Ran Test-NetConnection -Port 53

If the MAC address for the LB IP matched up with the host currently running the Pihole pod, then things worked fine. If not, then the test failed. I then ran arp -d to clear out the arp cache. After a few times, I would get a different MAC address and would repeat the same test.

Once I set externalTrafficPolicy=cluster, then Pihole worked in all combinations. According to https://metallb.universe.tf/usage/, setting externalTrafficPolicy to local prevents MetalLB from forwarding requests outside the node.

cluster
This policy results in uniform traffic distribution across all pods in the service. However, kube-proxy will obscure the source IP address of the connection when it does load balancing, so your pod logs will show that external traffic appears to be coming from the service’s leader node.

local
With the Local traffic policy, kube-proxy on the node that received the traffic sends it only to the service’s pod(s) that are on the same node. There is no “horizontal” traffic flow between nodes. Because kube-proxy doesn’t need to send traffic between cluster nodes, your pods can see the real source IP address of incoming connections. The downside of this policy is that incoming traffic only goes to some pods in the service. Pods that aren’t on the current leader node receive no traffic, they are just there as replicas in case a failover is needed.

So, it appears we have an issue here. Using externalTrafficPolicy=cluster means that everything appears to be coming from one IP address, which prevents setting rules per client, but setting externalTrafficPolicy=local breaks DNS when the Pihole pod isn't on the node that answered the MetalLB ARP request.

Anybody have an idea how to reconcile this? Maybe I'm missing something fundamental here about how to configure MetalLB or Pihole.

1 reply

dwarf-king-hreidmar Feb 19, 2023

I'm not sure why I don't have this issue but I thought I'd share my setup so you can tell me that either I've been lucky or my solution truly works.

Setup:
k0s
MetalLB: the only overrides I've dropped in are a IPAddressPool (spec.addresses) and an L2Advertisement
Traefik (unused for pihole DNS or pihole web)
Pi-hole: Currently I deploy two copies of pi-hole with the same helm chart in two different namespaces.

I think these are the relevant overrides. They are identical on both of my site.yml but the IPs are changed.

serviceDns:
  annotations:
    metallb.universe.tf/allow-shared-ip: dns
  loadBalancerIP: 192.168.x.55
  type: LoadBalancer

serviceWeb:
  loadBalancerIP: 192.168.x.56
  type: LoadBalancer```

to me this doesn't appear to obfuscate any source IP

i5Js · 2023-04-12T09:00:29Z

i5Js
Apr 12, 2023

Bumping this topic.

I have deployed a new cluster, just to test traefik. After a lof of reading, I see I have this "issue", but with some differences

I had to deploy the services as ClusterIP. Selecting LoadBalancer and setting the IP used by traefik, I have the following message and the svc is pending: Failed to allocate IP for "dns-home/pihole-dns-tcp": can't change sharing key for "dns-home/pihole-dns-tcp", address also in use by kube-system/traefik If I select other IP from the metallb pool, the clients can't use it as DNS, basically I get a time out. I have deployed a ingressroutetcp and ingressrouteudp but they didn't help
Setting the externalTrafficPolicy to cluster or local, doesn't make any different, I'm loosing the stats in the web-ui, because I can only see the IP from the internal nodes.

Any ideas?

0 replies

chrisbalmer · 2023-12-07T02:52:34Z

chrisbalmer
Dec 7, 2023

Simple:

The fix when using L2 with MetalLB is to run pihole with enough replicas to have 1 on each node. Then set them with topology constraints or anti affinity to make sure 1 lands on each node. This way whichever node is the MetalLB speaker will have a pihole to respond. The downside is you won't be able to see your logs easily without a sidecar to forward them to a log aggregation platform as your ingress may drop you onto an inactive pod.

Advanced:

Or use BGP with MetalLB and the node with the pod will advertise the IP assigned to the service for that pod. If you have more than 1 pod on more than 1 node, each node with a pod will advertise it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 Node K8S , Metallb, pihole service externalTrafficPolicy #233

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

2 Node K8S , Metallb, pihole service externalTrafficPolicy #233

quotidian-ennui Jun 27, 2022

Consequences

Replies: 3 comments · 1 reply

kenlasko Nov 15, 2022

dwarf-king-hreidmar Feb 19, 2023

i5Js Apr 12, 2023

chrisbalmer Dec 7, 2023

quotidian-ennui
Jun 27, 2022

Replies: 3 comments 1 reply

kenlasko
Nov 15, 2022

i5Js
Apr 12, 2023

chrisbalmer
Dec 7, 2023