Skip to content
This repository was archived by the owner on Apr 24, 2024. It is now read-only.

Commit dc5275e

Browse files
committed
Update README
1 parent e752bb7 commit dc5275e

File tree

1 file changed

+40
-79
lines changed

1 file changed

+40
-79
lines changed

README.md

Lines changed: 40 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -6,35 +6,32 @@
66
- [Why this project is needed](#why-this-project-is-needed)
77
- [How it works](#how-it-works)
88
- [Internals](#internals)
9-
- [WARNING: potential instability](#warning-potential-instability)
9+
- [Potential instability](#potential-instability)
1010
- [Alternative solutions](#alternative-solutions)
1111
- [Deployment](#deployment)
1212
- [Helm](#helm)
1313
- [Local testing](#local-testing)
1414
- [Configuration](#configuration)
1515
- [Environment variables](#environment-variables)
16-
- [Set up an egress firewall using Kyverno](#set-up-an-egress-firewall-using-kyverno)
1716
- [CoreDNS](#coredns)
1817
- [RBAC](#rbac)
1918
- [Roadmap](#roadmap)
2019
<!--toc:end-->
2120

2221
## Project description
2322

24-
dns-resolution-operator is a Kubernetes operator that creates API resources with the resolved IP addresses of domains. This project is in early development with a fully functional alpha release.
23+
dns-resolution-operator is a Kubernetes operator that creates NetworkPolicies in which egress traffic to a list of domain names is allowed. The operator takes care of resolving the domain names to a list of IP addresses. This project is in early development with a fully functional alpha release.
2524

26-
This operator allows users to create an egress or ingress firewall in which certain hostnames (FQDNs) are whitelisted or blocked. Another operator, such as Kyverno, can be combined with dns-resolution-operator to create NetworkPolicies containing the resolved IP addresses of a list of hostnames.
27-
28-
The operator does its best to update resources immediately after the DNS server's cache expires. However, creating NetworkPolicies with this method may still cause instability ([see below](#warning-potential-instability)). This project is only the first step towards creating a stable long-term solution. The next step will be to create a CoreDNS plugin that delivers new uncached records to the operator a few seconds before it refreshes the cache for other clients. Once this is done, the road is free to implement FQDN resolution in native Kubernetes NetworkPolicies.
25+
This operator is best used in combination with the [k8s_cache plugin](https://github.com/delta10/k8s_cache) for CoreDNS. This allows the operator to update NetworkPolicies before the Cluster's DNS cache expires. Without the plugin, a small percentage of requests to domains with dynamic DNS responses will fail ([see below](#potential-instability)).
2926

3027

3128
### Why this project is needed
3229

3330
Kubernetes [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can be used to allow or block ingress and egress traffic to parts of the cluster. While NetworkPolicies support allowing and blocking IP ranges, there is no support for hostnames. Such a feature is particularly useful for those who want to block all egress traffic except for a couple of whitelisted hostnames.
3431

35-
[Existing solutions](#alternative-solutions) all have their limitations. There is a need for a simple solution based on DNS, that does not require a proxy nor altering DNS records and that works for any type of traffic (not just HTTPS).
32+
Existing solutions all have their [limitations](#alternative-solutions). There is a need for a simple solution based on DNS, that does not require a proxy nor altering DNS records and that works for any type of traffic (not just HTTPS). This solution should also be stable for domains with dynamic DNS reponses.
3633

37-
dns-resolution-operator is this simple solution. All it does is resolve FQDNs and store them in "IPMap" API resources. Another operator such as Kyverno can be used to generate native Kubernetes NetworkPolicies using these IP addresses. [See the instructions below.](#set-up-an-egress-firewall-using-kyverno)
34+
dns-resolution-operator is this simple solution. All it does is resolve FQDNs, store them in "IPMap" custom resources, and update NetworkPolicies with the resolved IP addresses.
3835

3936
### How it works
4037

@@ -47,61 +44,59 @@ metadata:
4744
name: whitelist-google
4845
namespace: default
4946
spec:
50-
createDomainIPMapping: true # whether to keep the association between domain and IP in the resulting IPMap
5147
domainList:
5248
- google.com
5349
- www.google.com
5450
```
5551
56-
It will then create IPMaps like the following:
52+
It will then create NetworkPolicies like the following:
5753
58-
```
59-
apiVersion: dns.k8s.delta10.nl/v1alpha1
60-
kind: IPMap
54+
```yaml
55+
apiVersion: networking.k8s.io/v1
56+
kind: NetworkPolicy
6157
metadata:
6258
name: whitelist-google
6359
namespace: default
64-
data:
65-
domains:
66-
- ips:
67-
- 142.250.179.142/32
68-
- 142.251.39.110/32
69-
name: google.com
70-
- ips:
71-
- 142.251.36.36/32
72-
- 142.250.179.164/32
73-
name: www.google.com
60+
spec:
61+
podSelector: {}
62+
policyTypes:
63+
- Egress
64+
egress:
65+
- to:
66+
- ipBlock:
67+
cidr: 142.250.179.132/32
68+
- to:
69+
- ipBlock:
70+
cidr: 172.217.23.196/32
7471
```
7572
76-
For creating NetworkPolicies [see below](#set-up-an-egress-firewall-using-kyverno).
73+
To keep track of the mapping between domain name and policy, the operator also creates custom resources called IPMaps. Administrators who want to generate customized NetworkPolicies can do so on the basis of the IPMaps (using another operator such as Kyverno). To disable the generation of NetworkPolicies, set `spec.generateType` to "IPMap".
7774

7875
### Internals
7976

80-
The controller pod of dns-resolution-operator watches API resources of kind DNSResolver, which contain lists of domain names. The controller looks up the IP addresses of all domain names and appends them to an API resource of kind IPMap.
77+
The controller pod of dns-resolution-operator watches API resources of kind DNSResolver, which contain lists of domain names. The controller looks up the IP addresses of all domain names and appends them to a custom resource of kind IPMap and a NetworkPolicy, both with the same name and namespace as the DNSResolver.
8178

82-
On each reconciliation of a DNSResolver, the controller first queries the API server for kube-dns endpoints. It adds each endpoint to its list of DNS servers (the kube-dns service is bypassed). It then queries each DNS server for A records for each of the domains in the DNSResolver. It is necessary to query all servers, since each server has its own internal cache.
79+
On each reconciliation of a DNSResolver, the controller first queries the API server for endpoints of the DNS service (by default kube-dns in namespace kube-system). It adds each endpoint to its list of DNS servers (but not the service ClusterIP). It then queries each DNS server for A records for each of the domains in the DNSResolver. (It is necessary to query all servers, since each server has its own internal cache.)
8380

84-
The IP addressess are then appended to an IPMap with the same name, if they are not already in that IPMap. An internal cache in the controller is also updated with the last time that each IP address was encountered. Finally, IP addresses in the IPMap that have not been seen for a certain amount of time are deleted.
81+
The IP addressess are then appended to an IPMap and NetworkPolicy with the same name. An internal cache in the controller is also updated with the last time that each IP address was encountered. Finally, IP addresses that have not been seen for a certain amount of time (`IP_EXPIRATION`) are deleted.
8582

8683
The DNSResolver is requeued for reconciliation when the earliest cache expires in the full list of records it received (based on the TTL response).
8784

88-
The user is responsible for generating NetworkPolicies on the basis of IPMaps. I recommend to use Kyverno ([see below](#set-up-an-egress-firewall-using-kyverno)).
85+
### Potential instability
8986

90-
### WARNING: potential instability
87+
Whenever a DNS server clears it cache, there is a period of about 2 seconds when NetworkPolicies are not yet updated with the new IP addresses. This means that connection attempts to these hostnames might fail for about 2 seconds. This problem is best resolved by using the [k8s_cache plugin](https://github.com/delta10/k8s_cache).
9188

92-
Whenever a DNS server clears it cache, there is a period of about 2 seconds when any NetworkPolicies are not yet updated with the new IP addresses. If the intention is to whitelist hostnames for egress traffic, this means that connection attempts to these hostnames might fail for about 2 seconds.
89+
Without the plugin, a small percentage of requests to hosts with dynamic DNS responses may fail. In my testing with `IP_EXPIRATION` set to "12h", requests to www.google.com eventually have a failure rate of around 0.02%. However, in the first 10 minutes, the failure rate is about 1%.
9390

94-
To solve this completely, a CoreDNS plugin needs to be written that delivers new records to our controller a few seconds before it refreshes the cache for other clients.
95-
96-
As long as this plugin does not exist, there are a few things you can do to reduce the amount of connection failures:
97-
- Ensure that all pods in the cluster use kube-dns for DNS resolution.
98-
- Make sure that the DNS service sends the remaining cache duration as TTL, which is the default in CoreDNS (see the `keepttl` option [in CoreDNS](https://coredns.io/plugins/cache/)).
99-
- Increase the cache duration of the DNS service ([see below](#coredns)).
100-
- Set a higher IPExpiration (see [Environment variables](#environment-variables)). This is the amount of time that IPs are stored in an IPMap since they were last seen in a DNS response.
91+
When not using k8s_cache, there are a few things you can do to reduce the amount of connection failures:
92+
- Ensure that all pods in the cluster use a caching DNS server. The instances of this server should be endpoints of a Kubernetes service. dns-resolution-operator should be configured to use this service ([see below](#environment-variables)).
93+
- Make sure that the DNS server sends the remaining cache duration as TTL, which is the default in CoreDNS (see the `keepttl` option [in CoreDNS](https://coredns.io/plugins/cache/)).
94+
- Increase the cache duration of the DNS server ([see below](#coredns)).
95+
- Set a higher IPExpiration (see [Environment variables](#environment-variables)). This is the amount of time that IPs are remembered since they were last seen in a DNS response.
10196

10297
### Alternative solutions
10398
- [egress-operator](https://github.com/monzo/egress-operator) by Monzo. A very smart solution that runs a Layer 4 proxy for each whitelisted domain name. However, you need to run a proxy pod for each whitelisted domain, and you need to install a CoreDNS plugin to redirect traffic to the proxies. See also their [blog post](https://github.com/monzo/egress-operator).
104-
- [FQDNNetworkPolicies](https://github.com/GoogleCloudPlatform/gke-fqdnnetworkpolicies-golang). The GKE project is no longer maintained, but [there is a fork here](https://github.com/nais/fqdn-policy). The GKE project is quite similar to ours, but doesn't work well with hosts that dynamically return different A records. This project aims to have better stability in those sitations ([see above](#warning-potential-instability)).
99+
- [FQDNNetworkPolicies](https://github.com/GoogleCloudPlatform/gke-fqdnnetworkpolicies-golang). The GKE project is no longer maintained, but [there is a fork here](https://github.com/nais/fqdn-policy). The GKE project is quite similar to ours, but doesn't work well with hosts that dynamically return different A records. This project aims to have better stability in those sitations ([see above](#potential-instability)).
105100
- Service meshes such as Istio ([see docs](https://istio.io/latest/docs/tasks/traffic-management/egress/egress-control)) can be used to create an HTTPS egress proxy that only allows traffic to certain hostnames. Such a solution does not use DNS at all but TLS SNI (Server Name Indication). However, it can only be used for HTTPS traffic.
106101
- Some network plugins have a DNS-based solution, like CiliumNetworkPolicies ([see docs](https://docs.cilium.io/en/stable/security/policy/language/#dns-based)).
107102
- There is a [proposal](https://github.com/kubernetes-sigs/network-policy-api/blob/main/npeps/npep-133.md) to extend the NetworkPolicy API with an FQDN selector.
@@ -152,49 +147,16 @@ The following environment variable control the behaviour of the controller.
152147
|---------------- | --------------- | --------------- |
153148
| DNS_ENABLE_IPV6 | Set to `1` to do AAAA lookups | `0` |
154149
| DNS_ENVIRONMENT | `kubernetes`: use kube-dns pods as DNS servers<br>`resolv.conf`: use all DNS servers from `/etc/resolv.conf`<br>`resolv.conf-tcp`: same as `resolv.conf` but use TCP instead of UDP | `kubernetes` |
155-
| IP_EXPIRATION | How long to keep IPs that have not been seen in IPMaps (uses [ParseDuration](https://pkg.go.dev/time#ParseDuration)) | `12h` |
156-
| MAX_REQUEUE_TIME | How many seconds to wait until reconciling a DNSResolver after a reconciliation | `3600` |
157-
150+
| DNS_UPSTREAM_SERVICE | Name of the cluster DNS service in namespace kube-system | `kube-dns` |
151+
| IP_EXPIRATION | How long to keep IPs that have not been seen in IPMaps (uses [ParseDuration](https://pkg.go.dev/time#ParseDuration)) | `1h` |
152+
| MAX_REQUEUE_TIME | The maximum seconds to wait to reconcile a DNSResolver after a successful reconciliation | `3600` |
158153

159-
### Set up an egress firewall using Kyverno
160-
161-
Kyverno can be used to generate NetworkPolicies to your liking with IPMaps as input. For example. the following Kyverno ClusterPolicy will create an egress whitelist for every IPMap, with the same name and namespace.
162-
163-
```yaml
164-
apiVersion: kyverno.io/v1
165-
kind: ClusterPolicy
166-
metadata:
167-
name: ip-whitelist
168-
spec:
169-
generateExisting: true
170-
rules:
171-
- name: generate-whitelists
172-
match:
173-
any:
174-
- resources:
175-
kinds:
176-
- dns.k8s.delta10.nl/v1alpha1/IPMap
177-
generate:
178-
apiVersion: networking.k8s.io/v1
179-
kind: NetworkPolicy
180-
name: "{{request.object.metadata.name}}"
181-
namespace: "{{request.object.metadata.namespace}}"
182-
synchronize: true
183-
data:
184-
spec:
185-
# select all pods in the namespace
186-
podSelector: {}
187-
policyTypes:
188-
- Egress
189-
egress:
190-
- to: "{{ (request.object.data.domains[].ips)[] | map(&{cidr: @}, @) | items(@, 'foo', 'ipBlock') }}"
191-
# the above is a workaround. the below doesn't work; see https://github.com/kyverno/kyverno/issues/9668
192-
# - to: "{{(request.object.data.domains[].ips)[] | map(&{ipBlock: {cidr: @} }, @)}}"
193-
```
194154

195155
### CoreDNS
196156

197-
When `dns-resolution-operator` is used to create a NetworkPolicy firewall, it is advisable to increase the default cache duration for external domains. Below is a suggested configuration.
157+
It is best to setup CoreDNS with k8s_cache instead of cache. For instructions see [k8s_cache](https://github.com/delta10/k8s_cache).
158+
159+
If you are not using k8s_cache, stability might improve if you increase the cache for external domains. For example:
198160

199161
```yaml
200162
apiVersion: v1
@@ -223,6 +185,5 @@ The operator comes with two custom resources: `dnsresolvers.dns.k8s.delta10.nl`
223185
## Roadmap
224186

225187
Plans for the future:
226-
- Create a CoreDNS plugin to completely get rid of instability
227-
- Add a method to directly create NetworkPolicies instead of IPMaps
188+
- Create a custom resource to have more control over the resulting NetworkPolicy, similar to [FQDNNetworkPolicies](https://github.com/GoogleCloudPlatform/gke-fqdnnetworkpolicies-golang).
228189
- Create a custom reconciliation queue for a specific combinations of IPMap, DNS server and domain. This will greatly reduce the number of lookups.

0 commit comments

Comments
 (0)