-
Notifications
You must be signed in to change notification settings - Fork 5k
Concurrent fetch of azure metricdefinitions and batchApi usage #41790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent fetch of azure metricdefinitions and batchApi usage #41790
Conversation
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
Microsoft.DocumentDb/databaseAccounts (1 resource)resource type:
Activity:
# x-pack/metricbeat/modules.d/azure.yml
- module: azure
metricsets:
- database_account
enabled: true
period: 300s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600s
UPDATE: I didn't build the right version, I'm re-testing 9.0.0 8.17.19.0.0
Issues(1) Timegrain for azure.database_account.create_account.count is emptyIn version 8.17.1, the timegrain for this field is PT5M. (2) The azure.database_account.service_availability.avg (timegrain PT1H) is missingVersion 9.0.0 always collects 7 documents with PT5M, while version 8.17.1 collect 7 documents PT5M + 1 document PT1H during the first iteration and again every 60 mins. Is 9.0.0 missing the PT1H document on the first iteration? Waiting for the next iteration to double-check. After 75 mins, no UPDATE: tested by @MichaelKatsoulis I managed to collect |
UPDATE: I built the wrong version, I'm re-testing 9.0.0 with Microsoft.DocumentDb/databaseAccounts (1 resource) and I'll update the previous comment. My apologies for the noise. |
Microsoft.KeyVault/vaults (10 resources)resource type:
Activity:
- module: azure
metricsets:
- monitor
enabled: true
period: 60s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600s
resources:
- resource_query: "resourceType eq 'Microsoft.KeyVault/vaults'"
resource_group:
- "mbranca-az-scalability-kv-r10"
metrics:
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
- name: StatusCode
value: '*'
- name: StatusCodeClass
value: '*'
ignore_unsupported: true
name:
- ServiceApiLatency
- Availability
- ServiceApiResult
namespace: Microsoft.KeyVault/vaults
timegrain: PT1M
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
ignore_unsupported: true
name:
- ServiceApiHit
namespace: Microsoft.KeyVault/vaults
timegrain: PT1M
- dimensions:
- name: ActivityType
value: '*'
- name: ActivityName
value: '*'
- name: TransactionType
value: '*'
ignore_unsupported: true
name:
- SaturationShoebox
namespace: Microsoft.KeyVault/vaults
timegrain: PT1M Notes: When the key vaults are unused (like in this resource group), they only generates a subset of metrics:
8.17.1In progress. I can see the three metrics (Availability, API Hits, API Results), grouped in two documents. So 2 documents x 10 resources = 20 documents per iteration: 9.0.0In progress. First iterations are okay. I get the same number of documents (20) as 8.17.1 and same values. Still checking, but this case looks good. |
@MichaelKatsoulis, I found a couple of issues relate to timegrain in the Microsoft.DocumentDb/databaseAccounts (1 resource) test. |
Microsoft.ContainerRegistry/registries (1 resource)resource type:
Activity:
- module: azure
metricsets:
- container_registry
enabled: true
period: 300s
client_id: '${AZURE_CLIENT_ID:""}'
client_secret: '${AZURE_CLIENT_SECRET:""}'
tenant_id: '${AZURE_TENANT_ID:""}'
subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
refresh_list_interval: 600s Since we had issue with PT1H metrics, I tried another metricset with this timegrain. 8.17.1After one iteration, 8.17.1 collected:
9.0.0After one iteration, 8.17.1 collected:
Conclusion✅ With the recent code changes 8.17.1 and 9.0.0 yield the same outcome. Metrics docs |
…-azure-metricdefinitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance gains from the new sdk/monitor/query/azmetrics
package with batch API are extremely compelling.
I would love to simplify the internal structure, but I am also okay with going with the PR as-is, collect customer feedback, and switch to the batch API in the next release.
I added a few non-blocking comments for things we may want to address before merging.
_boolean_ | ||
Optional, by default is set to False. Set this to True when facing scalability issues. When configured, the azure batch api will be used | ||
to fetch metrics of multiple resources in one api call. | ||
Currently supported metricsets are monitor, container_registry, container_instance, container_service, compute_vm, compute_vm_scaleset, database_account and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add storage
to the list, or remove it because the metricset supports all the metricsets, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't storage in this list?
…d batchApi usage (#43923) * Concurrent fetch of azure metricdefinitions and batchApi usage (#41790) * Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum * Resolve conflicts --------- Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
…nd batchApi usage (#44243) * Concurrent fetch of azure metricdefinitions and batchApi usage (#41790) * Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc * Resolve conflicts --------- Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
…nd batchApi usage (#44241) The changes affect azure monitor and relevant metricsets. The list of metricsets affected are: - `monitor` - `container_registry` - `container_instance` - `container_service` - `compute_vm` - `compute_vm_scaleset` - `database_account` - `storage_account` A new configuration parameter is introduced `enable_batch_api` of type boolean. If set to `false`(default) nothing changes in the way the metrics are collected for these metricsets. If set to `true`: - The metric definitions of resources are collected asynchronously and write the results in a channel. - The channel is read and when the number of definitions collected reach 50 (batch API limit) - The metrics definitions are grouped based on criteria(1) and the azure BatchAPI is used to retrieve metrics of multiple resources with one api call. 1. Grouping criteria are - Namespace - SubscriptionID - Location - Names - TimeGrain - Dimensions
* Use concurrency in metricsdefinition collection * Change ResourceConfigurations.Metrics to a map * Use batch API * New queryResourceClient per location * Wait for 50 reource ids before fetching the metrics * Set timegrain if is equal to ''" * Use batch API as feature * Use baseclient to tackle code duplication * Add unit tests for concurrent fetching of metric definitions * Add batch client unit tests * Add support of batch API for storage accounts * Update docs and add unit tests form storage client * Split metric names by 20 (cherry picked from commit 13f8fde) # Conflicts: # go.mod # go.sum # metricbeat/docs/modules/azure.asciidoc
…nd batchApi usage (#44242) The changes affect azure monitor and relevant metricsets. The list of metricsets affected are: - `monitor` - `container_registry` - `container_instance` - `container_service` - `compute_vm` - `compute_vm_scaleset` - `database_account` - `storage_account` A new configuration parameter is introduced `enable_batch_api` of type boolean. If set to `false`(default) nothing changes in the way the metrics are collected for these metricsets. If set to `true`: - The metric definitions of resources are collected asynchronously and write the results in a channel. - The channel is read and when the number of definitions collected reach 50 (batch API limit) - The metrics definitions are grouped based on criteria(1) and the azure BatchAPI is used to retrieve metrics of multiple resources with one api call. 1. Grouping criteria are - Namespace - SubscriptionID - Location - Names - TimeGrain - Dimensions
return fmt.Errorf("no resources were found based on all the configurations options entered") | ||
} | ||
|
||
metricStores := make(map[ResDefGroupingCriteria]*MetricStore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metricStores only work on a single goroutine, so no mutex is needed?
groupedMetrics := map[ResDefGroupingCriteria][]Metric{ | ||
criteria: store.GetMetrics(), | ||
} | ||
metricValues := client.GetMetricsInBatch(groupedMetrics, referenceTime, report) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another points, if MetricStore need mutex, here should have an bug when another goroutine AddMetric into store.
bug for array copy will not share the resize action, see playground
But for now, MetricStore is only work in single goroutine, so it work well for now, right?
The changes affect azure monitor and relevant metricsets. The list of metricsets affected are:
monitor
container_registry
container_instance
container_service
compute_vm
compute_vm_scaleset
database_account
storage_account
A new configuration parameter is introduced
enable_batch_api
of type boolean.If set to
false
(default) nothing changes in the way the metrics are collected for these metricsets.If set to
true
:metrics of multiple resources with one api call.
Proposed commit message
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs