-
Notifications
You must be signed in to change notification settings - Fork 41
NETOBSERV-2429: Deploy FLP as a service #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jotak: This pull request references NETOBSERV-2247 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target either version "4.21." or "openshift-4.21.", but it targets "netobserv-1.11" instead. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comments, the PR is mostly LGTM.
I really like your approach to HPA and enforcing or not the number of replicas.
About connection tracking, if I remember correctly, we need both flows captured from the source and the destination to go to the same FLP pod. If it is the case, this will break it.
@jotak: This pull request references NETOBSERV-2429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
8647d10
to
d21cf8b
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1953 +/- ##
==========================================
- Coverage 71.72% 71.12% -0.61%
==========================================
Files 80 80
Lines 10723 10787 +64
==========================================
- Hits 7691 7672 -19
- Misses 2626 2704 +78
- Partials 406 411 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
@jotak: This pull request references NETOBSERV-2429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
🎉
I added a validation check to make this combination forbidden |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:1ef2f81 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-1ef2f81 Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-1ef2f81
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
8beb2d8
to
3536cf8
Compare
This is an intermediate alternative between the Kafka mode and the Direct mode, that is more suitable for quick install on large clusters (Kafka mode is a more complex setup, whereas Direct mode isn't suitable on large clusters due to the memory consumption of FLP) To use it, set `deploymentModel` to `Service`. There are potential caveat to check: - Without sticky session, no guarantee that the agents talk to the same FLP instance. I don't think it's an issue in the nominal case, but might be a problem for conversation tracking?
Replicas were being reconciled despite being unmanaged, when something else was triggering the reconcile event
3536cf8
to
1d83851
Compare
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:2fb9bd2 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-2fb9bd2 Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-2fb9bd2
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
- New rule, for Service deploymentModel, allow port 2055/hostnetwork - Remove most rules regarding the communication with agents: as they use hostnetwork, it's not needed to allow traffic between main and privileged ns. As a result, the netpol in privileged ns denies pretty much everything (except prometheus scraper) - Traffic with openshift-console only needed for ingress, not egress - Traffic to apiserver restricted to ns openshift-apiserver and openshift-kube-apiserver (note that an ovn bug requires to set an empty pod selector to make it work, see https://issues.redhat.com/browse/OSDOCS-14395) - Rule to apiserver was duplicated, removed one - Remove the rule to loki when it's installed in netobserv namespace - Optimize rules by merging similar rules affecting several namespaces
@OlivierCazade in last commit 3b71182 I took the opportunity to further restrict the netpol rules, I found a couple of areas to harden; see the commit description for the details Here's the policies that we get as a result (here with loki in same namespace): Main ns apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netobserv
namespace: netobserv
spec:
egress:
- to:
- podSelector: {} #inside traffic
- ports:
- port: 6443 # API server
protocol: TCP
to:
- namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- openshift-apiserver
- openshift-kube-apiserver
podSelector: {}
- to:
- namespaceSelector: # other namespaces
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- openshift-dns
- openshift-monitoring
podSelector: {}
ingress:
- from:
- podSelector: {} # inside traffic
- from:
- namespaceSelector: # openshift-console/9001
matchLabels:
kubernetes.io/metadata.name: openshift-console
ports:
- port: 9001
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
policy-group.network.openshift.io/host-network: ""
ports:
- port: 9443 # operator webhook
protocol: TCP
- port: 2055 # FLP collector (only needed with the new Service deployment model)
protocol: TCP
- from:
- namespaceSelector: # other namespaces
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- openshift-monitoring
podSelector: {}
podSelector: {}
policyTypes:
- Ingress
- Egress Agent ns apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netobserv
namespace: netobserv-privileged
spec:
ingress:
- from:
- namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- openshift-monitoring # prometheus scraper
podSelector: {}
podSelector: {}
policyTypes:
- Ingress
- Egress
|
(I may split this in a different PR if needed) |
Description
This is an intermediate alternative between the Kafka mode and the Direct mode, that is more suitable for quick install on large clusters (Kafka mode is a more complex setup, whereas Direct mode isn't suitable on large clusters due to the memory consumption of FLP)
To use it, set
deploymentModel
toService
.There's a caveat with conversation tracking because of the absence of sticky sessions; agents not talking always to the same FLP instance doesn't play well with this feature that requires statefulness. For that reason, enabling conntrack results in an error from the validation hook.
Also I'd like to deprecate the embedded HPA configuration, and replace it with just a flag telling if the number of replicas has to be enforced or not. So users can set up their own HPA as they want, and we don't need to deal with the HPA API anymore.
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.