[aws] [cloudwatch_metrics] Map aws.dimensions field as object #11883

zmoog · 2024-11-26T16:19:25Z

Proposed commit message

Change the mapping type for the aws.dimensions field from flattened to object.

Currently, all *_metrics data streams but one use the object mapping. The cloudwatch_metrics data stream uses the flattened type instead.

We need to unify the mapping of aws.dimensions across all metrics-related data streams in the AWS integration.
If all data streams use the exact mapping for aws.dimensions, users will be able to query and build a dashboard that correlates data across different data streams.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
~~I have verified that any added dashboard complies with Kibana's Dashboard good practices~~

Author's Checklist

[aws] Bump package-spec version to 3.3.1 #11893 (to get subobjects: false support)
Add a case for "How to test this PR locally" to prove the new mapping makes a difference for query and dashboard purposes.

How to test this PR locally

Start the local stack (use 8.17.0)
Install the latest released integration (using AWS 2.36.2)
Check the component template
Post the test document
Try to visualize the metric in Kibana
Upgrade the integration to this PR version
Delete the data stream and re-index the same test document
Re-try to visualize the metric in Kibana using the new mapping

Check the component template

With 2.36.2, the component template should be something similar to:

// GET _component_template/metrics-aws.cloudwatch_metrics@package
{
  "aws": {
    "dynamic": true,
    "type": "object",
    "properties": {
      "dimensions_fingerprint": {
        "time_series_dimension": true,
        "type": "keyword"
      },
      "cloudwatch": {
        "properties": {
          "namespace": {
            "ignore_above": 1024,
            "type": "keyword"
          }
        }
      },
      "tags": {
        "type": "flattened"
      },
      "dimensions": {
        "type": "flattened"
      }
    }
  }
}

Post the test document

Using the Dev Tools, post the following test document:

// POST metrics-aws.cloudwatch_metrics-sdh5390/_doc
{
  "@timestamp": "2024-12-30T14:57:07+01:00",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
        // "ImageId": "whatever",
        // "InstanceId": "whatever"
        //"InstanceType": "whatever"
      },
      "metric": {
        "cpu": 10
      }
  }
}

Try to visualize the metric in Kibana

If I try to visualize the CPU metric in Kibana, the dimension subfields in aws.dimensions.* are not available:

Upgrade the integration to this PR version

Bump the integration version (for example, 2.37.0), and build and update the integration:

elastic-package build && elastic-package stack up -d -v --services package-registry

Delete the data stream and re-index the same test document

// DELETE _data_stream/metrics-aws.cloudwatch_metrics-sdh5390

// POST metrics-aws.cloudwatch_metrics-sdh5390/_doc
{
  "@timestamp": "2024-12-30T14:57:07+01:00",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
        // "ImageId": "whatever",
        // "InstanceId": "whatever"
        //"InstanceType": "whatever"
      },
      "metric": {
        "cpu": 10
      }
  }
}

If we check the mapping, we should now see a dynamic template:

Re-try to visualize the metric in Kibana using the new mapping

With the updated mapping, we can now use aws.dimensions.* fields in Kibana to break down values by a dimension:

Related issues

Closes Bring uniformity in aws.dimensions.* fields mapping #11806

elastic-vault-github-plugin-prod · 2024-11-26T17:54:21Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

elastic-sonarqube · 2024-11-26T17:54:36Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

zmoog · 2024-11-27T11:21:41Z

All *_metrics data streams use the object mapping.

For example, here is the dimensions definition for EC2 metrics:

# packages/aws/data_stream/ec2_metrics/fields/fields.yml
- name: aws
  type: group
  fields:
    - name: dimensions
      type: group
      fields:
        - name: AutoScalingGroupName
          type: keyword
          dimension: true
          description: An Auto Scaling group is a collection of instances you define if you're using Auto Scaling.
        - name: ImageId
          type: keyword
          dimension: true
          description: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI)
        - name: InstanceId
          type: keyword
          dimension: true
          description: Amazon EC2 instance ID
        - name: InstanceType
          type: keyword
          dimension: true
          description: This dimension filters the data you request for all instances running with this specified instance type.

Since ALL metrics data streams use this mapping, it probably makes sense to use the object type in the cloudwatch_metrics as well instead of the flattened type. We can't do the opposite since TSDB doesn't allow dimension: true on flattened type.

tetianakravchenko

please add a change log and the version change

tetianakravchenko · 2024-11-27T11:40:22Z

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml

@@ -5,8 +5,10 @@
      type: flattened
      description: |
        Tag key value pairs from aws resources.
-    - name: dimensions
-      type: flattened
+    - name: dimensions.*


should be added explicitly subobjects: false ? for custom dimension keys

Yeah, but this requires a package-spec update, which needs several changes. We need to make these changes happen in a dedicated PR.

ok, maybe add a comment with this information in the field definition as a reminder

I added a task to upgrade the package-spec to (at least) 3.1.0.

(it's simpler than expected: there were a lot of errors, but they are instances of the same class of error)

The PR to bump the package-spec version to 3.3.0 is #11893

@tetianakravchenko, are you okay to bump package-spec to 3.3.0 and introduce subobjects: false in dedicated PRs?

@zmoog sure, lets add it in dedicated PR 👍

MichaelKatsoulis · 2024-11-27T11:55:32Z

Looks good to me. Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation?

felixbarny · 2024-11-27T12:03:34Z

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml

+    - name: dimensions.*
+      type: object
+      object_type: keyword
+      object_type_mapping_type: "*"


Isn't * the default? If so, this can be removed.

If I am not looking in the wrong place, the object_type_mapping_type definition does not have a default value.

What I meant is that I think you can remove object_type_mapping_type completely which will also match all mapping types, as no explicit subset of mapping types have been specified.

felixbarny · 2024-11-27T12:06:17Z

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml

+    - name: dimensions.*
+      type: object
+      object_type: keyword
+      object_type_mapping_type: "*"
      description: |
        Metric dimensions.
    - name: dimensions_fingerprint


I think we could remove the fingerprint in favor of mapping aws.dimensions.* as dimensions. If we decide to do that, we should apply some of the learnings from the compatibility issues introduced in the prometheus integration when we removed the fingerprint and also perform dedicated upgrade tests.

As I see this change was reverted in the end for the prometheus #10766
There is a meta issue - #9910, I think such change should be part of this meta issue

The part that wasn't reverted is that labels are still mapped as dimensions and the fingerprint is not mapped as a dimension.

There is a meta issue - #9910, I think such change should be part of this meta issue

Yeah, I'm also in favour of leaving this change to the meta issue #9910 and keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.

WDYT @felixbarny ?

I'd split this into two parts. What I think we should do in this PR is to do the same what we ended up doing with the Prometheus integration - map dimensions.* as TSDB dimensions, keep the fingerprint processing and field but not map it as a dimension. As a second step, in the context of #9910, we can then work on removing the fingerprinting entirely.

keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.

Sounds like what I'm proposing would be aligned with that? However, I think we shouldn't add dimensions_fingerprint to the more specific aws integrations that previously didn't have that field.

Ouch.

Thanks for the heads up, I'm restoring the fingerprint then.

dimensions_fingerprint field restored!

But you can remove mapping the fingerprint field as a dimension.

Since the new mapping will take place after the look-ahead time has passed, don't we also need to keep the dimensions_fingerprint as as well?

Yes, we need to keep mapping the field. But we don't need to map it as a dimension, as all dimension.* fields are mapped as a dimension.

The current index will continue to map field using the "old" mappings

when ES performs the rollover we'll get the "new" mappings.

We need the pipeline to keep the old mapping run while look-ahead time passes.

Got it.

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml

zmoog · 2024-12-30T16:03:41Z

Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation?

I guess the main issue for users is (quoting @felixbarny) "[they] can’t build visualizations with flattened fields in Lens, because Kibana doesn’t recognize fields within flattened objects."

@MichaelKatsoulis, I updated the "how to test this locally" section accordingly.

This avoids data loss on integration upgrades.

felixbarny

LGTM!
Please create follow-ups for removing dimensions_fingerprint (#9910), and using subobjects: false for dimensions.*.

zmoog · 2024-12-30T23:20:58Z

I tested the upgrade from AWS integration 2.36.2 to 2.37.0 (the unreleased changes from this PR) with the following steps:

Started a brand new local stack (8.17.0)
Installed AWS integration 2.36.2 (with aws.dimensions as flattened)
Started sending 1 document every 5 secs, including a field containing a sequence number
Upgraded the AWS integration to 2.37.0 (unreleased, this PR)
Waited for the rollout to take effect (checking settings.index.time_series | .start_time, .end_time)
Checked that the the data stream didn't lose any sequence number

More details on selected steps.

Started sending 1 document every 5 secs

I used the following shell script:

sequence=0
while true
do
cat > metrics.json <<EOF
{
  "@timestamp": "$(date '+%Y-%m-%dT%H:%M:%S%z')",
    "aws": {
      "dimensions": {
        "name": "Maurizio Branca",
        "AutoScalingGroupName": "whatever"
      },
      "metric": {
        "cpu": 10,
        "sequence": $sequence
      }
  }
}  
EOF

((sequence++))

cat metrics.json | jq -c | es docs bulk -f - -i metrics-aws.cloudwatch_metrics-sdh5390
sleep 5
done

The scripts sends a document like the following every 5 secs:

{
  "@timestamp": "2024-12-31T00:14:58+0100",
  "aws": {
    "dimensions": {
      "name": "Maurizio Branca",
      "AutoScalingGroupName": "whatever"
    },
    "metric": {
      "cpu": 10,
      "sequence": 270
    }
  }
}

Waited for the rollout to take effect

Right after the upgrade, Fleet/ES creates a new -000002 index, but keeps writing to the -000001 index until the settings.index.time_series.end_time elapses.

Old index -000001:

{
  ".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000001": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "1000",
            "ignore_dynamic_beyond_limit": "true"
          }
        },
        "hidden": "true",
        "time_series": {
          "end_time": "2024-12-30T23:05:37.000Z",
          "start_time": "2024-12-30T20:35:37.000Z"
        },

New index -000002:

{
  ".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000002": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "1000",
            "ignore_dynamic_beyond_limit": "true"
          }
        },
        "hidden": "true",
        "time_series": {
          "end_time": "2024-12-30T23:35:37.000Z",
          "start_time": "2024-12-30T23:05:37.000Z"
        },

Checked that the the data stream didn't lose any sequence number

At 2024-12-30T23:05:37.000Z, ES started to write to the new index without losing a sequence number:

felixbarny · 2024-12-31T07:44:24Z

The testing methodology looks great!

Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests. The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure. WDYT @jsoriano?

jsoriano · 2024-12-31T10:02:23Z

Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests.

We have an open issue about testing integration upgrades: elastic/elastic-package#1831

The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure.

Some integrations start dummy services with docker compose to generate events. But for this case we'd be missing the validation on the sequence numbers.

tetianakravchenko

@zmoog please add changelog entry and change the version of the integration

tetianakravchenko · 2025-01-02T11:18:12Z

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml

@@ -5,8 +5,10 @@
      type: flattened
      description: |
        Tag key value pairs from aws resources.
-    - name: dimensions
-      type: flattened
+    - name: dimensions.*


@zmoog sure, lets add it in dedicated PR 👍

tetianakravchenko · 2025-01-02T11:23:20Z

@zmoog great approach to test this PR - #11883 (comment)! And thank you for sharing!

ritalwar

LGTM! Just ensure the changelog entry and version update as suggested already.

chemamartinez

LGTM

zmoog · 2025-01-02T16:56:14Z

@tetianakravchenko, changelog updated!

elastic-sonarqube · 2025-01-02T18:23:39Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
25.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2025-01-02T18:23:45Z

💚 Build Succeeded

Buildkite Build
Commit: 3851a3a

History

💚 Build #19924 succeeded 0b76887
💔 Build #19921 failed 7da35b0
💚 Build #18721 succeeded 9641abe

cc @zmoog

elastic-vault-github-plugin-prod · 2025-01-02T18:39:15Z

Package aws - 2.37.0 containing this change is available at https://epr.elastic.co/package/aws/2.37.0/

…c#11883) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. # Conflicts: # packages/aws/changelog.yml # packages/aws/manifest.yml

…rt of #11883) (#12237) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. --------- Co-authored-by: muthu-mps <101238137+muthu-mps@users.noreply.github.com> Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>

zmoog self-assigned this Nov 26, 2024

zmoog changed the title ~~[AWS] Map aws.dimensions as object of keywords~~ [aws] [metrics] Map aws.dimensions field as object of keywords Nov 26, 2024

andrewkroh added the Integration:aws AWS label Nov 26, 2024

zmoog changed the title ~~[aws] [metrics] Map aws.dimensions field as object of keywords~~ [aws] [metrics] Map aws.dimensions field as object Nov 26, 2024

zmoog added Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] enhancement New feature or request labels Nov 26, 2024

zmoog marked this pull request as ready for review November 27, 2024 11:20

zmoog requested review from a team as code owners November 27, 2024 11:20

zmoog requested review from jsoriano and felixbarny November 27, 2024 11:20

tetianakravchenko reviewed Nov 27, 2024

View reviewed changes

felixbarny reviewed Nov 27, 2024

View reviewed changes

packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml Outdated Show resolved Hide resolved

zmoog mentioned this pull request Nov 27, 2024

[aws] Bump package-spec version to 3.3.1 #11893

Open

5 tasks

zmoog added 2 commits December 30, 2024 16:12

Map aws.dimensions as object of keywords

7aadaf7

Define aws.dimensions using a dynamic template

7da35b0

zmoog force-pushed the zmoog/aws-dimensions-mappings branch from 9641abe to 7da35b0 Compare December 30, 2024 15:17

Update generated docs

05cc164

zmoog added 2 commits December 30, 2024 17:11

Restore dimensions_fingerprint

06e8b93

This avoids data loss on integration upgrades.

dimensions_fingerprint is no longer a dimension

0b76887

felixbarny approved these changes Dec 30, 2024

View reviewed changes

jsoriano approved these changes Dec 31, 2024

View reviewed changes

tetianakravchenko reviewed Jan 2, 2025

View reviewed changes

zmoog changed the title ~~[aws] [metrics] Map aws.dimensions field as object~~ [aws] [cloudwatch_metrics] Map aws.dimensions field as object Jan 2, 2025

ritalwar approved these changes Jan 2, 2025

View reviewed changes

chemamartinez approved these changes Jan 2, 2025

View reviewed changes

Update changelog

3851a3a

tetianakravchenko approved these changes Jan 2, 2025

View reviewed changes

zmoog enabled auto-merge (squash) January 2, 2025 17:37

zmoog merged commit 9c7a300 into elastic:main Jan 2, 2025
5 checks passed

zmoog deleted the zmoog/aws-dimensions-mappings branch January 3, 2025 12:01

zmoog mentioned this pull request Jan 7, 2025

[aws] [cloudwatch_metrics] Map aws.dimensions field as object (backport of #11883) #12237

Merged

6 tasks

[aws] [cloudwatch_metrics] Map aws.dimensions field as object #11883

[aws] [cloudwatch_metrics] Map aws.dimensions field as object #11883

Conversation

zmoog commented Nov 26, 2024 • edited Loading

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Check the component template

Post the test document

Try to visualize the metric in Kibana

Upgrade the integration to this PR version

Delete the data stream and re-index the same test document

Re-try to visualize the metric in Kibana using the new mapping

Related issues

elastic-vault-github-plugin-prod bot commented Nov 26, 2024 • edited Loading

🚀 Benchmarks report

elastic-sonarqube bot commented Nov 26, 2024

Quality Gate passed

zmoog commented Nov 27, 2024

tetianakravchenko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmoog Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmoog Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelKatsoulis commented Nov 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmoog Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

zmoog commented Dec 30, 2024 • edited Loading

felixbarny left a comment

Choose a reason for hiding this comment

zmoog commented Dec 30, 2024 • edited Loading

Started sending 1 document every 5 secs

Waited for the rollout to take effect

Checked that the the data stream didn't lose any sequence number

felixbarny commented Dec 31, 2024

jsoriano commented Dec 31, 2024

tetianakravchenko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tetianakravchenko commented Jan 2, 2025

ritalwar left a comment

Choose a reason for hiding this comment

chemamartinez left a comment

Choose a reason for hiding this comment

zmoog commented Jan 2, 2025

elastic-sonarqube bot commented Jan 2, 2025

Quality Gate passed

elasticmachine commented Jan 2, 2025

💚 Build Succeeded

History

elastic-vault-github-plugin-prod bot commented Jan 2, 2025

zmoog commented Nov 26, 2024 •

edited

Loading

elastic-vault-github-plugin-prod bot commented Nov 26, 2024 •

edited

Loading

zmoog Nov 27, 2024 •

edited

Loading

zmoog Nov 27, 2024 •

edited

Loading

zmoog Dec 30, 2024 •

edited

Loading

zmoog commented Dec 30, 2024 •

edited

Loading

zmoog commented Dec 30, 2024 •

edited

Loading