Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws] [cloudwatch_metrics] Map aws.dimensions field as object #11883

Merged
merged 6 commits into from
Jan 2, 2025

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Nov 26, 2024

Proposed commit message

Change the mapping type for the aws.dimensions field from flattened to object.

Currently, all *_metrics data streams but one use the object mapping. The cloudwatch_metrics data stream uses the flattened type instead.

We need to unify the mapping of aws.dimensions across all metrics-related data streams in the AWS integration.
If all data streams use the exact mapping for aws.dimensions, users will be able to query and build a dashboard that correlates data across different data streams.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

How to test this PR locally

  • Start the local stack (use 8.17.0)
  • Install the latest released integration (using AWS 2.36.2)
  • Check the component template
  • Post the test document
  • Try to visualize the metric in Kibana
  • Upgrade the integration to this PR version
  • Delete the data stream and re-index the same test document
  • Re-try to visualize the metric in Kibana using the new mapping

Check the component template

With 2.36.2, the component template should be something similar to:

// GET _component_template/metrics-aws.cloudwatch_metrics@package
{
  "aws": {
    "dynamic": true,
    "type": "object",
    "properties": {
      "dimensions_fingerprint": {
        "time_series_dimension": true,
        "type": "keyword"
      },
      "cloudwatch": {
        "properties": {
          "namespace": {
            "ignore_above": 1024,
            "type": "keyword"
          }
        }
      },
      "tags": {
        "type": "flattened"
      },
      "dimensions": {
        "type": "flattened"
      }
    }
  }
}

Post the test document

Using the Dev Tools, post the following test document:

// POST metrics-aws.cloudwatch_metrics-sdh5390/_doc
{
  "@timestamp": "2024-12-30T14:57:07+01:00",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
        // "ImageId": "whatever",
        // "InstanceId": "whatever"
        //"InstanceType": "whatever"
      },
      "metric": {
        "cpu": 10
      }
  }
}

Try to visualize the metric in Kibana

If I try to visualize the CPU metric in Kibana, the dimension subfields in aws.dimensions.* are not available:

CleanShot 2024-12-30 at 16 53 42@2x

Upgrade the integration to this PR version

Bump the integration version (for example, 2.37.0), and build and update the integration:

elastic-package build && elastic-package stack up -d -v --services package-registry

Delete the data stream and re-index the same test document

// DELETE _data_stream/metrics-aws.cloudwatch_metrics-sdh5390

// POST metrics-aws.cloudwatch_metrics-sdh5390/_doc
{
  "@timestamp": "2024-12-30T14:57:07+01:00",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
        // "ImageId": "whatever",
        // "InstanceId": "whatever"
        //"InstanceType": "whatever"
      },
      "metric": {
        "cpu": 10
      }
  }
}

If we check the mapping, we should now see a dynamic template:

CleanShot 2024-12-30 at 16 59 59@2x

Re-try to visualize the metric in Kibana using the new mapping

With the updated mapping, we can now use aws.dimensions.* fields in Kibana to break down values by a dimension:

CleanShot 2024-12-30 at 17 01 32@2x

Related issues

@zmoog zmoog self-assigned this Nov 26, 2024
@zmoog zmoog changed the title [AWS] Map aws.dimensions as object of keywords [aws] [metrics] Map aws.dimensions field as object of keywords Nov 26, 2024
@zmoog zmoog changed the title [aws] [metrics] Map aws.dimensions field as object of keywords [aws] [metrics] Map aws.dimensions field as object Nov 26, 2024
@zmoog zmoog added Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] enhancement New feature or request labels Nov 26, 2024
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Nov 26, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@zmoog zmoog marked this pull request as ready for review November 27, 2024 11:20
@zmoog zmoog requested review from a team as code owners November 27, 2024 11:20
@zmoog zmoog requested review from jsoriano and felixbarny November 27, 2024 11:20
@zmoog
Copy link
Contributor Author

zmoog commented Nov 27, 2024

All *_metrics data streams use the object mapping.

For example, here is the dimensions definition for EC2 metrics:

# packages/aws/data_stream/ec2_metrics/fields/fields.yml
- name: aws
  type: group
  fields:
    - name: dimensions
      type: group
      fields:
        - name: AutoScalingGroupName
          type: keyword
          dimension: true
          description: An Auto Scaling group is a collection of instances you define if you're using Auto Scaling.
        - name: ImageId
          type: keyword
          dimension: true
          description: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI)
        - name: InstanceId
          type: keyword
          dimension: true
          description: Amazon EC2 instance ID
        - name: InstanceType
          type: keyword
          dimension: true
          description: This dimension filters the data you request for all instances running with this specified instance type.

Since ALL metrics data streams use this mapping, it probably makes sense to use the object type in the cloudwatch_metrics as well instead of the flattened type. We can't do the opposite since TSDB doesn't allow dimension: true on flattened type.

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a change log and the version change

@@ -5,8 +5,10 @@
type: flattened
description: |
Tag key value pairs from aws resources.
- name: dimensions
type: flattened
- name: dimensions.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added explicitly subobjects: false ? for custom dimension keys

Copy link
Contributor Author

@zmoog zmoog Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but this requires a package-spec update, which needs several changes. We need to make these changes happen in a dedicated PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe add a comment with this information in the field definition as a reminder

Copy link
Contributor Author

@zmoog zmoog Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a task to upgrade the package-spec to (at least) 3.1.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's simpler than expected: there were a lot of errors, but they are instances of the same class of error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR to bump the package-spec version to 3.3.0 is #11893

@tetianakravchenko, are you okay to bump package-spec to 3.3.0 and introduce subobjects: false in dedicated PRs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmoog sure, lets add it in dedicated PR 👍

@MichaelKatsoulis
Copy link
Contributor

Looks good to me. Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation?

- name: dimensions.*
type: object
object_type: keyword
object_type_mapping_type: "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't * the default? If so, this can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not looking in the wrong place, the object_type_mapping_type definition does not have a default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that I think you can remove object_type_mapping_type completely which will also match all mapping types, as no explicit subset of mapping types have been specified.

- name: dimensions.*
type: object
object_type: keyword
object_type_mapping_type: "*"
description: |
Metric dimensions.
- name: dimensions_fingerprint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove the fingerprint in favor of mapping aws.dimensions.* as dimensions. If we decide to do that, we should apply some of the learnings from the compatibility issues introduced in the prometheus integration when we removed the fingerprint and also perform dedicated upgrade tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see this change was reverted in the end for the prometheus #10766
There is a meta issue - #9910, I think such change should be part of this meta issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part that wasn't reverted is that labels are still mapped as dimensions and the fingerprint is not mapped as a dimension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a meta issue - #9910, I think such change should be part of this meta issue

Yeah, I'm also in favour of leaving this change to the meta issue #9910 and keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.

WDYT @felixbarny ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd split this into two parts. What I think we should do in this PR is to do the same what we ended up doing with the Prometheus integration - map dimensions.* as TSDB dimensions, keep the fingerprint processing and field but not map it as a dimension. As a second step, in the context of #9910, we can then work on removing the fingerprinting entirely.

keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.

Sounds like what I'm proposing would be aligned with that? However, I think we shouldn't add dimensions_fingerprint to the more specific aws integrations that previously didn't have that field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch.

Thanks for the heads up, I'm restoring the fingerprint then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimensions_fingerprint field restored!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can remove mapping the fingerprint field as a dimension.

Since the new mapping will take place after the look-ahead time has passed, don't we also need to keep the dimensions_fingerprint as as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to keep mapping the field. But we don't need to map it as a dimension, as all dimension.* fields are mapped as a dimension.

Copy link
Contributor Author

@zmoog zmoog Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The current index will continue to map field using the "old" mappings
  • when ES performs the rollover we'll get the "new" mappings.
  • We need the pipeline to keep the old mapping run while look-ahead time passes.

Got it.

@zmoog zmoog force-pushed the zmoog/aws-dimensions-mappings branch from 9641abe to 7da35b0 Compare December 30, 2024 15:17
@zmoog
Copy link
Contributor Author

zmoog commented Dec 30, 2024

Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation?

I guess the main issue for users is (quoting @felixbarny) "[they] can’t build visualizations with flattened fields in Lens, because Kibana doesn’t recognize fields within flattened objects."

@MichaelKatsoulis, I updated the "how to test this locally" section accordingly.

Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Please create follow-ups for removing dimensions_fingerprint (#9910), and using subobjects: false for dimensions.*.

@zmoog
Copy link
Contributor Author

zmoog commented Dec 30, 2024

I tested the upgrade from AWS integration 2.36.2 to 2.37.0 (the unreleased changes from this PR) with the following steps:

  • Started a brand new local stack (8.17.0)
  • Installed AWS integration 2.36.2 (with aws.dimensions as flattened)
  • Started sending 1 document every 5 secs, including a field containing a sequence number
  • Upgraded the AWS integration to 2.37.0 (unreleased, this PR)
  • Waited for the rollout to take effect (checking settings.index.time_series | .start_time, .end_time)
  • Checked that the the data stream didn't lose any sequence number

More details on selected steps.

Started sending 1 document every 5 secs

I used the following shell script:

sequence=0
while true
do
cat > metrics.json <<EOF
{
  "@timestamp": "$(date '+%Y-%m-%dT%H:%M:%S%z')",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
      },
      "metric": {
        "cpu": 10,
        "sequence": $sequence
      }
  }
}  
EOF

((sequence++))

cat metrics.json | jq -c | es docs bulk -f - -i metrics-aws.cloudwatch_metrics-sdh5390
sleep 5
done

The scripts sends a document like the following every 5 secs:

{
  "@timestamp": "2024-12-31T00:14:58+0100",
  "aws": {
    "dimensions": {
      "name": "Maurizio Branca",
      "AutoScalingGroupName": "whatever"
    },
    "metric": {
      "cpu": 10,
      "sequence": 270
    }
  }
}

Waited for the rollout to take effect

Right after the upgrade, Fleet/ES creates a new -000002 index, but keeps writing to the -000001 index until the settings.index.time_series.end_time elapses.

Old index -000001:

{
  ".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000001": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "1000",
            "ignore_dynamic_beyond_limit": "true"
          }
        },
        "hidden": "true",
        "time_series": {
          "end_time": "2024-12-30T23:05:37.000Z",
          "start_time": "2024-12-30T20:35:37.000Z"
        },

New index -000002:

{
  ".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000002": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "1000",
            "ignore_dynamic_beyond_limit": "true"
          }
        },
        "hidden": "true",
        "time_series": {
          "end_time": "2024-12-30T23:35:37.000Z",
          "start_time": "2024-12-30T23:05:37.000Z"
        },

Checked that the the data stream didn't lose any sequence number

At 2024-12-30T23:05:37.000Z, ES started to write to the new index without losing a sequence number:

CleanShot 2024-12-31 at 00 06 09@2x

@felixbarny
Copy link
Member

The testing methodology looks great!

Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests. The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure. WDYT @jsoriano?

@jsoriano
Copy link
Member

Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests.

We have an open issue about testing integration upgrades: elastic/elastic-package#1831

The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure.

Some integrations start dummy services with docker compose to generate events. But for this case we'd be missing the validation on the sequence numbers.

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmoog please add changelog entry and change the version of the integration

@@ -5,8 +5,10 @@
type: flattened
description: |
Tag key value pairs from aws resources.
- name: dimensions
type: flattened
- name: dimensions.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmoog sure, lets add it in dedicated PR 👍

@tetianakravchenko
Copy link
Contributor

@zmoog great approach to test this PR - #11883 (comment)! And thank you for sharing!

@zmoog zmoog changed the title [aws] [metrics] Map aws.dimensions field as object [aws] [cloudwatch_metrics] Map aws.dimensions field as object Jan 2, 2025
Copy link
Contributor

@ritalwar ritalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just ensure the changelog entry and version update as suggested already.

Copy link
Contributor

@chemamartinez chemamartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zmoog
Copy link
Contributor Author

zmoog commented Jan 2, 2025

@tetianakravchenko, changelog updated!

@zmoog zmoog enabled auto-merge (squash) January 2, 2025 17:37
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @zmoog

@zmoog zmoog merged commit 9c7a300 into elastic:main Jan 2, 2025
5 checks passed
@elastic-vault-github-plugin-prod

Package aws - 2.37.0 containing this change is available at https://epr.elastic.co/package/aws/2.37.0/

@zmoog zmoog deleted the zmoog/aws-dimensions-mappings branch January 3, 2025 12:01
zmoog added a commit to zmoog/integrations that referenced this pull request Jan 7, 2025
…c#11883)

Change the mapping type for the `aws.dimensions` field from `flattened` to `object`.

Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead.

We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration.
 If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams.
# Conflicts:
#	packages/aws/changelog.yml
#	packages/aws/manifest.yml
zmoog added a commit that referenced this pull request Jan 8, 2025
…rt of #11883) (#12237)

Change the mapping type for the `aws.dimensions` field from `flattened` to `object`.

Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead.

We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration.
 If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams.

---------

Co-authored-by: muthu-mps <101238137+muthu-mps@users.noreply.github.com>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:aws AWS Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bring uniformity in aws.dimensions.* fields mapping
9 participants