Skip to content

A comprehensive solution for dynamic Prometheus service discovery in AWS ECS using OpenTelemetry. This tool enables HTTP-based discovery of ECS tasks and CloudMap services, allowing for flexible filtering, label management, and scalable monitoring. Ideal for teams leveraging AWS ECS.

License

Notifications You must be signed in to change notification settings

apptality/aws-ecs-cloudmap-prometheus-sd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS ECS & CloudMap Prometheus Discovery

Build License: MIT Docker Image Version Docker Image Size (tag) Medium

Overview

This application facilitates discovery of ECS and/or CloudMap resources and compatible with Prometheus HTTP Service Discovery.

Application leverages AWS API to dynamically discover:

making it easier for Prometheus to monitor these services without hardcoding their addresses and ports, or using any kind of file-based discovery (thus can be run as a standalone application: as it's own AWS ECS Service, for example).

AWS ECS & CloudMap Prometheus Discovery

Features

Below is a high level overview of what application is capable of. For the full list of supported configuration parameters and how to override these using Docker Environment Variables please refer to appsettings.json

  • Prometheus Compatibility: By default, exposes 9001:/prometheus-targets endpoint which response is compatible with <http_sd_config> of Prometheus configuration file, as well as with OpenTelemetry Prometheus Receiver (see example below).

  • ECS Discovery: Supports discovery of ECS clusters, services, running tasks

    • Supply EcsClusters as semicolon separated string of ECS clusters to include: "cluster1;cluster2;"
    • Supports filtering of ECS Services that should be included in the Prometheus response based on Resource Tags (see EcsServiceSelectorTags configuration property)
    • Supports filtering to control which tags should be included as labels in the response (see EcsTaskTags, EcsServiceTags configuration properties)
  • AWS CloudMap Integration: Supports discovery of CloudMap namespaces, services and instances

    • Supply CloudMapNamespaces as semicolon separated string of CloudMap namespaces to include: "namespace1;"
    • Supports filtering of CloudMap Services that should be included in the Prometheus response based on Resource Tags (see CloudMapServiceSelectorTags configuration property)
    • Supports filtering to control which tags should be included as labels in the response (see CloudMapServiceTags, CloudMapNamespaceTags configuration properties)
  • Scalable and Flexible: Built to be scalable and flexible, adapting to changes in the service infrastructure

    • Allows having any number of port-metrics pairs per each running ECS Task (useful when your ECS Task Definition defines multiple containers, which also expose ports and metrics endpoint). See MetricsPathPortTagPrefix configuration property for more details.
    • Allows supplying static set of labels to be added to every Prometheus target (see ExtraPrometheusLabels configuration property)
    • Supports rich mechanism for re-labling, so you don't have to fix your Grafana dashboards. See RelabelConfigurations configuration property for more details.
    • Exposes health check endpoint: /health

At least one of EcsClusters or CloudMapNamespaces must be provided for application to work. If you would like to include both Service Connect and Service Discovery targets, CloudMapNamespaces must be specified (with or without EcsClusters). When both parameters are specified, the end result is only an intersection of targets that exist in ECS clusters and CloudMap namespaces provided.

Permissions: IAM permissions are required to discover ECS clusters (ecs:Get*, ecs:List*, ecs:Describe*) and CloudMap namespaces (servicediscovery:Get*, servicediscovery:List*, servicediscovery:Discover*, route53:Get*)

Usage Example

For the full example of running this in AWS, please navigate to /example folder. Integrates with OpenTelemetry receivers config.

Refer to appsettings.json for all supported configuration options.

Example run command:

  docker run --rm \
    # ** REQUIRED PARAMETERS **
    # At least one of "EcsClusters" or "CloudMapNamespaces" must be provided.
    -e DiscoveryOptions__EcsClusters="<cluster1>;<cluster2>" \
    -e DiscoveryOptions__CloudMapNamespaces="<namespace1>;" \
    # If running outside of AWS - pass credentials explicitly
    # If running in AWS - use Task IAM Role.
    # IMPORTANT: Credentials need to have permissions to describe ECS cluster(optionally CloudMap namespaces)
    -e AWS_REGION="us-west-1" \
    -e AWS_ACCESS_KEY_ID="<key>" \
    -e AWS_SECRET_ACCESS_KEY="<secret>" \
    -e AWS_SESSION_TOKEN="<session_token>" \
    # ** OPTIONAL PARAMETERS **
    # Will only include those services, which have 'prom_scrape_target' tag set to 'yes'.
    # Leave blank to include all services in cluster(s)
    -e DiscoveryOptions__EcsServiceSelectorTags="prom_scrape_target=yes;" \
    # Same as above, but for filtering out CloudMap services
    # Leave blank ot include all services in namespace(s)
    -e DiscoveryOptions__CloudMapServiceSelectorTags="prom_scrape_target=yes;" \
    # Semicolon separated string of tag keys to include in the service discovery response as metadata.
    # Supports glob pattern matching using * and ?.
    -e DiscoveryOptions__EcsTaskTags="*" \
    -e DiscoveryOptions__EcsServiceTags="*" \
    -e DiscoveryOptions__CloudMapServiceTags="AmazonECS*;" \
    -e DiscoveryOptions__CloudMapNamespaceTags="*" \
    # Semicolon separated string of labels to include in the service discovery response as metadata.
    # Will be added to all discovered targets.
    -e DiscoveryOptions__ExtraPrometheusLabels="custom_static_tag=my-static-tag;" \
    # Tag prefix to identify metrics port, [path | "/metrics"], [name | ""] triplets.
    # Please refer to the configuration options to learn more.
    -e DiscoveryOptions__MetricsPathPortTagPrefix="METRICS_" \
    # Add new or modify existing labels in the response using token replacements,
    # to prevent the need of modifying your Grafana dashboards.
    # Please refer to the configuration options to learn more.
    -e DiscoveryOptions__RelabelConfigurations="cluster_and_service={{_sys_ecs_cluster}}-{{_sys_ecs_service}}" \
    # Instructs .NET application to listen on 9001 inside the container
    -e ASPNETCORE_URLS="http://*:9O01" \
    # ** DOCKER **
    -p 9001:9001 \
    apptality/aws-ecs-cloudmap-prometheus-discovery:latest

Example output:

[
  {
    "targets": [
      "10.200.10.200:8080"
    ],
    "labels": {
      "__metrics_path__": "/metrics",
      "instance": "10.200.10.200",
      "scrape_target_name": "app",
      "__meta_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
      "__meta_cloudmap_service_name": "service-app",
      "__meta_cloudmap_service_type": "ServiceConnect",
      "__meta_ecs_cluster": "my-ecs-cluster",
      "__meta_ecs_service": "my-fargate-application",
      "__meta_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
      "__meta_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
      "_sys_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
      "_sys_cloudmap_service_name": "my-fargate-application",
      "_sys_cloudmap_service_type": "ServiceConnect",
      "_sys_ecs_cluster": "my-ecs-cluster",
      "_sys_ecs_service": "my-fargate-application",
      "_sys_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
      "_sys_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
      "prom_scrape_target": "yes",
      "AmazonECSManaged": "true",
      "custom_static_tag": "my-static-tag",
      "cluster_and_service": "my-ecs-cluster-my-fargate-application"
    }
  },
  ...
]

Response Structure

Response is returned in HTTP_SD format compatible format:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

Success

Success response is returned with application/json HTTP Content Type, and 200 HTTP status code.

Some clarification on labels returned:

  • scrape_target_name - when any of resource contains METRICS_NAME_ AWS Resource Tag - this label will have tag value. ECS Tasks can contain multiple containers you would like to scrape from, this label being present helps to denote between such containers when running PromQL queries.
  • __meta - labels starting with this prefix are meta labels, and are not included into the resulting set stored Prometheus, but can be used for re-labeling.
  • _sys - labels starting with this prefix are generated out of AWS Resources properties (ECS, CloudMap). They are prefixed as such in order to prevent conflict with your existing infrastructure configuration, and combined with RelabelConfigurations enable powerful manipulations of the resulting labels set.

Anything else returned is either inferred from AWS Resource Tags specified via selectors (EcsTaskTags, EcsServiceTags, CloudMapServiceTags, CloudMapNamespaceTags), supplied via ExtraPrometheusLabels, or product of RelabelConfigurations configurations.

Please note, that tags are resolved from AWS resource in the following order:

ECS Task > ECS Service > CloudMap Service > CloudMap Namespace

meaning that if ECS Service has tag "MyCustomTag=EcsService" and CloudMap Service has the same tag, but with different value: "MyCustomTag=CloudMap" - the resulting scrape target label will have value of the former:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "MyCustomTag": "EcsService", ...
    }
  },
  ...
]

Error

When application runs into an error, response is returned with 500 HTTP status code:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.6.1",
  "title": "An error occurred while processing your request.",
  "status": 500
}

You'll need to investigate server logs for exception details:

2024-08-20 21:22:23.242 -03:00 [ERR] HTTP GET /prometheus-targets responded 500 in 99.9711 ms
2024-08-20 21:22:23.243 -03:00 [ERR] An unhandled exception has occurred while executing the request.
Microsoft.Extensions.Options.OptionsValidationException: At least one of 'EcsClusters' or 'CloudMapNamespaces' name must be specified.
   at Microsoft.Extensions.Options.OptionsFactory`1.Create(String name)
...

Storing this image in ECR

~/.scripts/dockerhub-to-ecr.sh script is intended for users who need control over their Docker images in AWS ECR, allowing to pull an image from DockerHub, re-tag it, and push it to an ECR repository with customizable options for platform and tagging.

Run the script specifying your ECR URL as the first positional parameter:

# <paste in your AWS credential>

cd ./scripts

chmod +x dockerhub-to-ecr.sh

./dockerhub-to-ecr.sh <target_ecr_repository_url> \
    # Defaults to 'apptality/aws-ecs-cloudmap-prometheus-discovery:latest'
    [source_dockerhub_image_url] \
    # Specify explicit value of image tag you want to label your ECR image with.
    # By default, same tag as on source image will be applied.
    [target_ecr_repository_tag (Default: same as source)] \
    # Specify docker platform to re-push. Amd64/Arm64 available.
    [docker_image_platform: 'linux/amd64' (Default) | 'linux/arm64']

Example:

./dockerhub-to-ecr.sh 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd

The above script will:

  1. Authenticate with AWS ECR region us-west-2
  2. Pull linux/amd64 platform of apptality/aws-ecs-cloudmap-prometheus-discovery:latest locally
  3. Re-tag it to 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest
  4. Push 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest to ECR

Positional Parameters

  • target_ecr_repository_url: The full URL of your target AWS ECR repository without tag.
  • source_dockerhub_image_url (optional): DockerHub image to pull (default: apptality/aws-ecs-cloudmap-prometheus-discovery:latest).
  • target_ecr_repository_tag (optional): ECR image tag (defaults to the source image tag).
  • docker_image_platform (optional): Image platform (linux/amd64 by default).

Note: you need to create destination ECR repository first.

References

This application is heavily influenced by the following articles:

Definitely suggest reading up on what is AWS Distro for OpenTelemetry (ADOT), as it provides open source APIs, libraries, and agents to collect logs, metrics, and traces from your applications.

For how to integrate AWS AMP (Prometheus) with your Grafana please refer to Set up Grafana open source or Grafana Enterprise for use with Amazon Managed Service for Prometheus article by AWS.

Alternatives

There are definitely some discovery tools that are alternative to current application:

  • Container Insights Prometheus metrics monitoring discovers your infrastructure using CloudWatch
    • IMHO - How CloudWatch agent configuration is structured around discovery of resources is NOT idiomatic to AWS. AWS has a powerful, rich support of tags on pretty much any resource. It would be very natural & AWS'ish to use tagging filters for resource selection instead of complex Regex filters.
  • aws-cloudmap-prometheus-sd plugin by AWS
  • prometheus-for-ecs plugin by AWS. Definitely dig into deploy-prometheus and deploy-adot folders of the repository to get a good understanding of how integrations are being set up.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues to discuss potential improvements or features.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A comprehensive solution for dynamic Prometheus service discovery in AWS ECS using OpenTelemetry. This tool enables HTTP-based discovery of ECS tasks and CloudMap services, allowing for flexible filtering, label management, and scalable monitoring. Ideal for teams leveraging AWS ECS.

Topics

Resources

License

Stars

Watchers

Forks