-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws] [cloudwatch_metrics] Map aws.dimensions field as object #11883
Conversation
🚀 Benchmarks reportTo see the full report comment with |
Quality Gate passedIssues Measures |
All For example, here is the # packages/aws/data_stream/ec2_metrics/fields/fields.yml
- name: aws
type: group
fields:
- name: dimensions
type: group
fields:
- name: AutoScalingGroupName
type: keyword
dimension: true
description: An Auto Scaling group is a collection of instances you define if you're using Auto Scaling.
- name: ImageId
type: keyword
dimension: true
description: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI)
- name: InstanceId
type: keyword
dimension: true
description: Amazon EC2 instance ID
- name: InstanceType
type: keyword
dimension: true
description: This dimension filters the data you request for all instances running with this specified instance type. Since ALL metrics data streams use this mapping, it probably makes sense to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a change log and the version change
@@ -5,8 +5,10 @@ | |||
type: flattened | |||
description: | | |||
Tag key value pairs from aws resources. | |||
- name: dimensions | |||
type: flattened | |||
- name: dimensions.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be added explicitly subobjects: false
? for custom dimension keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but this requires a package-spec update, which needs several changes. We need to make these changes happen in a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe add a comment with this information in the field definition as a reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a task to upgrade the package-spec to (at least) 3.1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's simpler than expected: there were a lot of errors, but they are instances of the same class of error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR to bump the package-spec version to 3.3.0 is #11893
@tetianakravchenko, are you okay to bump package-spec to 3.3.0 and introduce subobjects: false
in dedicated PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmoog sure, lets add it in dedicated PR 👍
Looks good to me. Could you add a section in the description on how to test this locally. A use case that would prove that this uniformity solves the problem of data correlation? |
- name: dimensions.* | ||
type: object | ||
object_type: keyword | ||
object_type_mapping_type: "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't *
the default? If so, this can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not looking in the wrong place, the object_type_mapping_type definition does not have a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that I think you can remove object_type_mapping_type
completely which will also match all mapping types, as no explicit subset of mapping types have been specified.
- name: dimensions.* | ||
type: object | ||
object_type: keyword | ||
object_type_mapping_type: "*" | ||
description: | | ||
Metric dimensions. | ||
- name: dimensions_fingerprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove the fingerprint in favor of mapping aws.dimensions.*
as dimensions. If we decide to do that, we should apply some of the learnings from the compatibility issues introduced in the prometheus integration when we removed the fingerprint and also perform dedicated upgrade tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part that wasn't reverted is that labels are still mapped as dimensions and the fingerprint is not mapped as a dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a meta issue - #9910, I think such change should be part of this meta issue
Yeah, I'm also in favour of leaving this change to the meta issue #9910 and keep the PR focus to have the same aws.dimensions
mapping across the AWS integrations.
WDYT @felixbarny ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd split this into two parts. What I think we should do in this PR is to do the same what we ended up doing with the Prometheus integration - map dimensions.*
as TSDB dimensions, keep the fingerprint processing and field but not map it as a dimension. As a second step, in the context of #9910, we can then work on removing the fingerprinting entirely.
keep the PR focus to have the same aws.dimensions mapping across the AWS integrations.
Sounds like what I'm proposing would be aligned with that? However, I think we shouldn't add dimensions_fingerprint
to the more specific aws integrations that previously didn't have that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch.
Thanks for the heads up, I'm restoring the fingerprint then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimensions_fingerprint
field restored!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you can remove mapping the fingerprint field as a dimension.
Since the new mapping will take place after the look-ahead time has passed, don't we also need to keep the dimensions_fingerprint
as as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to keep mapping the field. But we don't need to map it as a dimension, as all dimension.*
fields are mapped as a dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The current index will continue to map field using the "old" mappings
- when ES performs the rollover we'll get the "new" mappings.
- We need the pipeline to keep the old mapping run while look-ahead time passes.
Got it.
packages/aws/data_stream/cloudwatch_metrics/fields/package-fields.yml
Outdated
Show resolved
Hide resolved
9641abe
to
7da35b0
Compare
I guess the main issue for users is (quoting @felixbarny) "[they] can’t build visualizations with flattened fields in Lens, because Kibana doesn’t recognize fields within flattened objects." @MichaelKatsoulis, I updated the "how to test this locally" section accordingly. |
This avoids data loss on integration upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Please create follow-ups for removing dimensions_fingerprint
(#9910), and using subobjects: false
for dimensions.*
.
I tested the upgrade from AWS integration 2.36.2 to 2.37.0 (the unreleased changes from this PR) with the following steps:
More details on selected steps. Started sending 1 document every 5 secsI used the following shell script: sequence=0
while true
do
cat > metrics.json <<EOF
{
"@timestamp": "$(date '+%Y-%m-%dT%H:%M:%S%z')",
"aws": {
"dimensions": {
"name": "Maurizio Branca",
"AutoScalingGroupName": "whatever"
},
"metric": {
"cpu": 10,
"sequence": $sequence
}
}
}
EOF
((sequence++))
cat metrics.json | jq -c | es docs bulk -f - -i metrics-aws.cloudwatch_metrics-sdh5390
sleep 5
done The scripts sends a document like the following every 5 secs: {
"@timestamp": "2024-12-31T00:14:58+0100",
"aws": {
"dimensions": {
"name": "Maurizio Branca",
"AutoScalingGroupName": "whatever"
},
"metric": {
"cpu": 10,
"sequence": 270
}
}
} Waited for the rollout to take effectRight after the upgrade, Fleet/ES creates a new Old index {
".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000001": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "1000",
"ignore_dynamic_beyond_limit": "true"
}
},
"hidden": "true",
"time_series": {
"end_time": "2024-12-30T23:05:37.000Z",
"start_time": "2024-12-30T20:35:37.000Z"
}, New index {
".ds-metrics-aws.cloudwatch_metrics-sdh5390-2024.12.30-000002": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "1000",
"ignore_dynamic_beyond_limit": "true"
}
},
"hidden": "true",
"time_series": {
"end_time": "2024-12-30T23:35:37.000Z",
"start_time": "2024-12-30T23:05:37.000Z"
}, Checked that the the data stream didn't lose any sequence numberAt |
The testing methodology looks great! Longer-term, I wish we had something similar to this for all integrations as automated upgrade tests. The only thing that should be required in an integration is to provide a templated sample document. The rest could be a generic infrastructure. WDYT @jsoriano? |
We have an open issue about testing integration upgrades: elastic/elastic-package#1831
Some integrations start dummy services with docker compose to generate events. But for this case we'd be missing the validation on the sequence numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmoog please add changelog entry and change the version of the integration
@@ -5,8 +5,10 @@ | |||
type: flattened | |||
description: | | |||
Tag key value pairs from aws resources. | |||
- name: dimensions | |||
type: flattened | |||
- name: dimensions.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmoog sure, lets add it in dedicated PR 👍
@zmoog great approach to test this PR - #11883 (comment)! And thank you for sharing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just ensure the changelog entry and version update as suggested already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tetianakravchenko, changelog updated! |
Quality Gate passedIssues Measures |
💚 Build Succeeded
History
cc @zmoog |
Package aws - 2.37.0 containing this change is available at https://epr.elastic.co/package/aws/2.37.0/ |
…c#11883) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. # Conflicts: # packages/aws/changelog.yml # packages/aws/manifest.yml
…rt of #11883) (#12237) Change the mapping type for the `aws.dimensions` field from `flattened` to `object`. Currently, all `*_metrics` data streams but one use the `object` mapping. The `cloudwatch_metrics` data stream uses the `flattened` type instead. We need to unify the mapping of `aws.dimensions` across all metrics-related data streams in the AWS integration. If all data streams use the exact mapping for `aws.dimensions`, users will be able to query and build a dashboard that correlates data across different data streams. --------- Co-authored-by: muthu-mps <101238137+muthu-mps@users.noreply.github.com> Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Proposed commit message
Change the mapping type for the
aws.dimensions
field fromflattened
toobject
.Currently, all
*_metrics
data streams but one use theobject
mapping. Thecloudwatch_metrics
data stream uses theflattened
type instead.We need to unify the mapping of
aws.dimensions
across all metrics-related data streams in the AWS integration.If all data streams use the exact mapping for
aws.dimensions
, users will be able to query and build a dashboard that correlates data across different data streams.Checklist
changelog.yml
file.I have verified that any added dashboard complies with Kibana's Dashboard good practicesAuthor's Checklist
subobjects: false
support)How to test this PR locally
Check the component template
With 2.36.2, the component template should be something similar to:
Post the test document
Using the Dev Tools, post the following test document:
Try to visualize the metric in Kibana
If I try to visualize the CPU metric in Kibana, the dimension subfields in
aws.dimensions.*
are not available:Upgrade the integration to this PR version
Bump the integration version (for example, 2.37.0), and build and update the integration:
elastic-package build && elastic-package stack up -d -v --services package-registry
Delete the data stream and re-index the same test document
If we check the mapping, we should now see a dynamic template:
Re-try to visualize the metric in Kibana using the new mapping
With the updated mapping, we can now use
aws.dimensions.*
fields in Kibana to break down values by a dimension:Related issues
aws.dimensions.*
fields mapping #11806