Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CLI to enable traffic mirroring on VPC #6

Closed
chelma opened this issue Mar 20, 2023 · 14 comments
Closed

Update CLI to enable traffic mirroring on VPC #6

chelma opened this issue Mar 20, 2023 · 14 comments
Assignees

Comments

@chelma
Copy link
Collaborator

chelma commented Mar 20, 2023

Description

This task is update the CLI to be able to add traffic mirroring from a target VPC in the user's account to the Capture VPC in the user's account. It's expected this will be done by spinning up a CloudFormation stack that encapsulates the traffic mirroring details. Removal of the mirroring setup should occur when the CLI is invoked to tear down the full Arkime setup. Removal of the mirroring setup as an individual unit will be manual (non-CLI). It is expected that at the end of this task, the capture nodes should receive the user traffic and they should correctly transmit it to the capture bucket and the OpenSearch domain.

Acceptance Criteria

  • Able to set up traffic mirroring for a user VPC via the CLI
  • Captured packets in up in Bucket, metadata ends up in domain
@chelma
Copy link
Collaborator Author

chelma commented Apr 4, 2023

Picking up task

@chelma
Copy link
Collaborator Author

chelma commented Apr 4, 2023

I've been looking into how to structure this mirroring, and it seems like Gateway Load Balancers are ideally suited for this use-case [1][2]. However, there does not appear to be CDK support for its resource type yet [3]; searched through the CDK GitHub code, issues, etc. There is support in raw CloudFormation, however [4].

The mirroring docs indicate the options for mirroring from VPC1 to VPC2 in the same account are: "Intra-Region peering or a transit gateway or Gateway Load Balancer endpoint" [5].

Currently exploring transit gateways to see if they're a plausible option here.

[1] https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-gateway-load-balancer-supported-architecture-patterns/
[2] https://docs.aws.amazon.com/vpc/latest/mirroring/tm-example-glb-endpoints.html
[3] https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk-lib/aws-elasticloadbalancingv2
[4] https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticloadbalancingv2-loadbalancer.html#cfn-elasticloadbalancingv2-loadbalancer-type
[5] https://docs.aws.amazon.com/vpc/latest/mirroring/traffic-mirroring-connection.html

@chelma
Copy link
Collaborator Author

chelma commented Apr 4, 2023

Well, first off there is low-level CDK support for Transit Gateways, which is a plus [1]. From what I can tell, a Transit Gateway makes addresses in connected VPCs routeable rather than presenting a single endpoint for traffic to flow to like the Gateway Load Balancer. That probably isn't what we want here.

Gonna get a feel for how hard it would be to use CloudFormation to set up Gateway Load Balancers by making my own CDK Construct, as GLBs seem like the right answer here.

[1] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.CfnTransitGateway.html
[2] https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html

@chelma
Copy link
Collaborator Author

chelma commented Apr 4, 2023

OK - deep dived setting up GLBs with CDK, and it seems doable but will require trial and error.

The GLB getting started guide [1] provides a roadmap for how to set these up. The following components are:

  • Gateway Load Balancer: Will need to make my own from the L1 construct [2]. Needs to be created in the Capture VPC, configured with the Subnet IDs of the consuming capture nodes.
  • Target Groups: Will need to make my own from the L1 construct [3]. Configured with the Capture VPC ID.
  • Register Capture Nodes: The Capture Node Fargate Service needs to be registered with the target group. Looks like there are a few ways to do this, but it won't require any special code.
  • Listeners: Looks like I may need to make my own from the L1 construct [4].

All of the above resources are not unique to any traffic source and should probably be created as part of setup for the Capture VPC.

The following component is unique for each Traffic Source AWS Account. In our case, we're staying within the same account.

  • VPCEndpointService: Can probably use the standard construct - YAY! Might need to implement a simple interface. [5]

The follow components are unique for each Traffic Source VPC:

  • VPC Endpoints: Will need to make my own from the L1 construct [6]. Needed for each VPC/subnet traffic will be flowing from.
  • Routing: Needed in the traffic source VPC. Can probably use default constructs.

[1] https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/getting-started-cli.html
[2] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnLoadBalancer.html
[3] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnTargetGroup.html
[4] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_elasticloadbalancingv2.CfnListener.html
[5] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.VpcEndpointService.html
[6] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.CfnVPCEndpoint.html

@chelma
Copy link
Collaborator Author

chelma commented Apr 4, 2023

Looks like there's some examples of doing some/all of this floating around. Here's one of them [1].

[1] https://github.com/aws-samples/aws-secure-environment-accelerator/blob/main/src/lib/cdk-constructs/src/vpc/glb.ts

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

I've been fighting with Gateway LBs, ECS/Fargate, and CloudFormation all day. I feel like I have a good handle on the CloudFormation required to set up a GWLB. However, I'm having trouble getting my Fargate Tasks to register with the GWLB's target group.

GWLBs seem like a niche feature and don't have too many examples floating around for them. The only ones I can find that combine ECS with GWLB use EC2-backed ECS and associate the Target Group with the EC2 ASG. The ECS docs themselves don't even mention GWLBs [2]. Running through the "Getting Started" AWS CLI steps [3] and specifying the IPs of my Fargate Tasks as the Target Group targets with aws elbv2 register-targets ... works, but is obviously manual.

I'm gonna pretty close to switching over from Fargate to ECS-on-EC2; gonna give it a few more minutes though.

[1] https://github.com/aws-samples/aws-gateway-load-balancer-suricata-ids-ips-nsm/tree/45e590061a47a5bd022c62871d80b62cc23d0d4d
[2] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/load-balancer-types.html
[3] https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/getting-started-cli.html

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

On another note, I've also been having trouble with my ECS Service resources failing to stabilize during Cfn operations; very annoying. There's a fairly unhelpful support post about it. [1]

[1] https://repost.aws/knowledge-center/cloudformation-ecs-service-stabilize

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

Last-ditch effort failed; switching to ECS-on-EC2.

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

Switching over to ECS-on-EC2 was easy, and it appears that the GWLB and our ECS Cluster are integrated now. The Cluster's containers don't respond to LB health checks yet so it's hard to tell... but things look good enough for me to move on for the moment.

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

OK - ready to start tackling mirroring setup. However, to do that I need to have a plan for how all of our top-level calls will work together to handle state management. Here's what I'm currently thinking.

  • create-cluster --name my-cluster

    • Ensure SSM parameter for the cluster doesn't already exist
    • Create VPC Stack, Bucket Stack, OpenSearch Stack, Capture Nodes Stack
    • Create SSM parameter(s):
      /arkime/clusters/my-cluster
      => Store VPCEndpoint Service ID
  • add-vpc my-cluster vpc-1234

    • Use boto to get subnets in VPC, supply VPC and Subnets to CDK via context
    • Use CDK Create VPCEndpoint Stack, which puts stuff in the traffic source VPC
      • VPC Endpoints, Route Table, Traffic Mirroring Targets pointing to each VPCE, Traffic Filters/Filter Rules which all eni-specific traffic sessions will share
    • Use Boto to create SSM Parameter to store these details
      /arkime/clusters/my-cluster/vpcs/vpc-1234
      => Store VPCEndpoint Service ID
      /arkime/clusters/my-cluster/vpcs/vpc-1234/subnets/subnet-1
      /arkime/clusters/my-cluster/vpcs/vpc-1234/subnets/subnet-2
      => store Mirroring Targets ID
      => store Mirroring Filter ID
    • Use boto Find network interfaces in each subnet
    • Use boto to create Traffic Mirror Sessions connecting each ENI to the Target/Filter for its subnet using the SSM parameter to get their IDs
    • Creates SSM parameter(s)
      /arkime/clusters/my-cluster/vpcs/vpcs-1234/subnets/subnet-1/enis/eni-1111
      /arkime/clusters/my-cluster/vpcs/vpcs-1234/subnets/subnet-2/enis/eni-1112
      /arkime/clusters/my-cluster/vpcs/vpcs-1234/subnets/subnet-2/enis/eni-1113
      => store Traffic Mirror Session Identifier
  • list-clusters

  • remove-vpc my-cluster vpc-1234

    • Use boto to get all currently-captured subnets/ENIs for the VPC using SSM paths
    • Use boto to delete all ENI-specific Mirroring Sessions and the ENI-level SSM Parameter
    • Use boto to delete all subnet-specific Mirroring Targets and Mirroring Filters, as well as the Subnet-Level SSM parameter
    • Use boto to delete the VPC-level SSM parameter
    • Use CDK to tear down VPC-specific VPC Endpoint stack
  • destroy-cluster my-cluster

    • Use boto to check in SSM if there are any VPCs under the cluster, abort if there are
    • Kick off the existing destroy behavior

@chelma
Copy link
Collaborator Author

chelma commented Apr 5, 2023

SSM doc for dealing with parameter hierarchies: https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-paramstore-hierarchies.html

@chelma
Copy link
Collaborator Author

chelma commented Apr 10, 2023

PR posted for basic list-clusters, add-vpc, and remove-vpc capability. (see: #17).

I confirmed that traffic is being mirrored from the traffic source (our demo fargate containers curling Alexa top-100 sites) to the VPC Endpoint of our Gateway Load Balancer. Wasn't able to confirm that the traffic makes it to our capture nodes because I can't add them to the GWLB Target Group unless they respond to health checks, which they don't do. Therefore, the next steps are:

  • Clean up/unit test the basic code for the commands I just added
  • Set up Arkime on our Capture Nodes and ensure they respond to GWLB health checks.

@chelma
Copy link
Collaborator Author

chelma commented Apr 13, 2023

PR posted to resolve the task: #19

There will be some followup/cleanup work.

@chelma
Copy link
Collaborator Author

chelma commented Apr 14, 2023

PR merged; resolving. Follow-up work discussed in parent task (#3)

@chelma chelma closed this as completed Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant