-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add-vpc and remove-vpc use Events to create/destroy per-ENI mirroring #42
Conversation
Signed-off-by: Chris Helma <chelma+github@amazon.com>
* Given the move towards an event-based architecture, we needed a way to more easily keep track of what is happening in our system. I added CloudWatch Metrics to the lambda to indicate when each possible outcome occurred. Signed-off-by: Chris Helma <chelma+github@amazon.com>
* Like the CreateEniMirror Lambda, the Destroy Lambda emits metrics to CloudWatch on each possible outcome to help evaluate what's happening in the system. Signed-off-by: Chris Helma <chelma+github@amazon.com>
|
I agree; it created a bit of a mess. I unfortunately discovered it was necessary right in the middle of a bunch of changes in order for the namespacing inside the Lambda containers to work correctly.
So, we need to have a per-ENI Event/Lambda in order to handle the EC2/ECS autoscaling events which themselves operate on a per-ENI level. I'm planning on making another Event/Lambda that operates at the subnet level to detect changes in the ENIs we have mirroring set up for. This is what I was planning on (initially) having perform our automated scheduled scans (see #36). I think there's pros/cons to having that Lambda directly manipulate the ENIs, but my current inclination is to have it bulk-put Create/Destroy events to the Cluster Bus for our per-ENI Lambdas to action instead of doing the operations itself. We have a pre-built distributed event system that is designed to operate at-scale; we might as well use it. The alternative would be having that Subnet lambda tackle them single-threaded. I can see a per-VPC Event/Lambda existing and it's one piece I was planning on using to make the
Hmm, could use a bit more context to understand what you mean here. I would say that, in general, wiring up EventBridge Rules to look for Events and trigger a Lambda is really easy. The one thing I have a question about is how much to atomize our Lambdas. This PR proposes a model where we have separate Create and Destroy Lambdas for each VPC. This makes it easier to track what our system is doing (imo) because they generate separate metrics and logs this way. However, it is more things to keep track of. We may end up combining some Lambda-based responsibilities into unified Lambdas.
Cool! |
Ok. So are you saying in the future add-vpc would just use that? If thats not what you are saying then we will want to do some testing of onboarding large vpcs/subnets creating 1000s of ENI events.
Whenever I'm using an event bus I like to upfront at least discuss how I'm going to do event versioning when I discover I need more/less/different parameters in the messages. The two most common solutions are either a version field in every message or the name of the event changes (such as appending _v2 _v3 etc). Then the discussion is, should the initial implementation have this version marker or not. Such as version: 1 or _v1. The general issue is eventually you'll either have a newer version of the lambda or cli depending on upgrade order. |
Yeah - when we've built up to having a VPC-level Lambda, then we have our
Great point. Let me create a follow-up task for this and we can discuss the best strategy and document in there. |
Created follow-up task to add event versioning: #43 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (except for small conflict :) )
Description
add-vpc
andremove-vpc
CLI commands to emit events to the Cluster Bus to create/destroying the per-ENI mirroring components rather than perform that work themselves. This was basically copy/pasting the existing code into Lambdas with a few minor tweaks.Issues
Testing
add-vpc
andremove-vpc
and confirmed the mirroring resources were successfully created/destroyed by the event-driven Lambdas, that the Lambdas emitted the expected metrics to CloudWatch, and that our test traffic still showed up in the Arkime Dashboard for the Cluster