-
Notifications
You must be signed in to change notification settings - Fork 41
Fix incorrect default deny log when NetworkPolicy is chunked #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When a NetworkPolicy with many endpoints is chunked into multiple PolicyEndpoint resources, the agent was incorrectly applying default deny during pod startup. This occurred because isolation was checked per PolicyEndpoint chunk rather than after aggregating rules from all chunks. For example, a NetworkPolicy with both ingress and egress rules might be split into: - Chunk 1: podIsolation=[Ingress,Egress], only ingress rules - Chunk 2: podIsolation=[Ingress,Egress], only egress rules The agent would process Chunk 1, see Egress in podIsolation but len(egressRules)==0, and incorrectly enable default deny on egress until Chunk 2 was processed. Fix: Move isolation check outside the PolicyEndpoint aggregation loop to evaluate based on total aggregated rules, not individual chunks. Fixes: aws/amazon-network-policy-controller-k8s#201
| lastPE = currentPE | ||
| } | ||
| log().Infof("Total no.of - ingressRules %d egressRules %d", len(ingressRules), len(egressRules)) | ||
| // Check isolation after aggregating all rules from all PolicyEndpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what issue was this creating?
final decision will be made once we iterate on all the PE for a PodIdentifier. It seems current behavior of doing isIngressIsolated = isIngressIsolated || ingressIsolated and this updated change, both will result in same final behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was the log message. deriveDefaultPodIsolation() logs "Default Deny enabled on Egress" when called with egressRulesCount=0 inside the loop, even though other chunks have egress rules. The final boolean values are correct due to ||, but the premature log message was confusing customers running in standard mode who don't expect default deny for policies with explicit rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, yeah log message fix makes sense. CR title Fix incorrect default deny when NetworkPolicy is chunked threw off, probably edit it to Fix incorrect Default Deny log when NP is chunked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the actual issue could be when other Chunks haven't made it to the agent yet (as those are still getting processed by NPC), in that case Default Deny will still show up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, this fixes the case where all chunks exist but are being aggregated by the agent. would it be better to check if the PolicyEndpoint name indicates it's part of a chunked set (e.g. contains a chunk suffix) and suppress the log message for chunked policies? or is the sequential arrival case acceptable since it's transient during controller processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today there is no way to check if PolicyEndpoint has any siblings directly from same NetworkPolicy just by looking at its name or indirectly from other NetworkPolicies.
Doing chunk set (based on some deterministic chunk naming) makes sense from a single Network Policy perspective but pods can be targeted by multiple Network Policies and in this loop, we are iterating all PolicyEndpoints belonging to all Network Policies targeting that pod and then making the isolation decision.
|
LGTM, just to confirm this only happens in |
Issue #, if available:
aws/amazon-network-policy-controller-k8s#201
Description of changes:
When a NetworkPolicy with many endpoints is chunked into multiple PolicyEndpoint resources, the agent was incorrectly applying default deny during pod startup. This occurred because isolation was checked per PolicyEndpoint chunk rather than after aggregating rules from all chunks.
For example, a NetworkPolicy with both ingress and egress rules might be split into:
The agent would process Chunk 1, see Egress in podIsolation but len(egressRules)==0, and incorrectly enable default deny on egress until Chunk 2 was processed.
Fix: Move isolation check outside the PolicyEndpoint aggregation loop to evaluate based on total aggregated rules, not individual chunks.
Testing:
Tested on EKS cluster with NP_CONTROLLER_ENDPOINT_CHUNK_SIZE=50. Created chunked PolicyEndpoints where one chunk had podIsolation=[Ingress,Egress] but only ingress rules. Confirmed the bug by observing "Default Deny enabled on Egress" log message. After applying the fix, the same scenario no longer produces the incorrect default deny message, and egress traffic is properly allowed.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.