-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many uninformative fields in API Gateway access logs #6399
Comments
Assignee to consider next steps. |
Assignee to compile a list of fields, their cardinality and the top ten most frequent values for that field, for prod and the past four weeks. The result must be posted on Slack. A link to the Slack post must be posted here as a comment to this issue. The JSON must have the following structure: {
"statusCode": {
"cardinality": 4,
"top_ten": ["200", "400", "429", "500"]
},
"size": {
"cardinality": 123456,
"top_ten": [0, 1287, 71389, 57, 6947, 32, 9514, 3098730, 5527, 372]
},
…
} Assignee to document in this issue the method used to obtain the result. If a script is used, the script source should be posted here. |
Spike results posted to Slack. |
The following script is running Log Insights Queries under the hood, it was the only way to filter and leverage CloudWatch to perform the desired statistics, and most effectively get the desired results. The first query determines the cardinality of each field by using the statistic Then, to determine the top ten values for each field, the Log Insights query import json
from azul.deployment import aws
import time
logs = aws.client('logs')
cardinality_q = 'stats count_distinct(requestId) as requestId_cardinality, count_distinct(extendedRequestId) as extendedRequestId_cardinality, count_distinct(accountId) as accountId_cardinality, count_distinct(apiId) as apiId_cardinality, count_distinct(authorizer_claims_property) as authorizer_claims_property_cardinality, count_distinct(authorizer_error) as authorizer_error_cardinality, count_distinct(authorizer_principalId) as authorizer_principalId_cardinality, count_distinct(authorizer_property) as authorizer_property_cardinality, count_distinct(awsEndpointRequestId) as awsEndpointRequestId_cardinality, count_distinct(awsEndpointRequestId2) as awsEndpointRequestId2_cardinality, count_distinct(customDomain_basePathMatched) as customDomain_basePathMatched_cardinality, count_distinct(dataProcessed) as dataProcessed_cardinality, count_distinct(domainName) as domainName_cardinality, count_distinct(domainPrefix) as domainPrefix_cardinality, count_distinct(error_message) as error_message_cardinality, count_distinct(error_responseType) as error_responseType_cardinality, count_distinct(httpMethod) as httpMethod_cardinality, count_distinct(identity_accountId) as identity_accountId_cardinality, count_distinct(identity_caller) as identity_caller_cardinality, count_distinct(identity_cognitoAuthenticationProvider) as identity_cognitoAuthenticationProvider_cardinality, count_distinct(identity_cognitoAuthenticationType) as identity_cognitoAuthenticationType_cardinality, count_distinct(identity_cognitoIdentityId) as identity_cognitoIdentityId_cardinality, count_distinct(identity_cognitoIdentityPoolId) as identity_cognitoIdentityPoolId_cardinality, count_distinct(identity_principalOrgId) as identity_principalOrgId_cardinality, count_distinct(identity_clientCert_clientCertPem) as identity_clientCert_clientCertPem_cardinality, count_distinct(identity_clientCert_subjectDN) as identity_clientCert_subjectDN_cardinality, count_distinct(identity_clientCert_issuerDN) as identity_clientCert_issuerDN_cardinality, count_distinct(identity_clientCert_serialNumber) as identity_clientCert_serialNumber_cardinality, count_distinct(identity_clientCert_validity_notBefore) as identity_clientCert_validity_notBefore_cardinality, count_distinct(identity_clientCert_validity_notAfter) as identity_clientCert_validity_notAfter_cardinality, count_distinct(identity_sourceIp) as identity_sourceIp_cardinality, count_distinct(identity_user) as identity_user_cardinality, count_distinct(identity_userAgent) as identity_userAgent_cardinality, count_distinct(identity_userArn) as identity_userArn_cardinality, count_distinct(integration_error) as integration_error_cardinality, count_distinct(integration_integrationStatus) as integration_integrationStatus_cardinality, count_distinct(integration_latency) as integration_latency_cardinality, count_distinct(integration_requestId) as integration_requestId_cardinality, count_distinct(integration_status) as integration_status_cardinality, count_distinct(integrationErrorMessage) as integrationErrorMessage_cardinality, count_distinct(integrationLatency) as integrationLatency_cardinality, count_distinct(integrationStatus) as integrationStatus_cardinality, count_distinct(path) as path_cardinality, count_distinct(protocol) as protocol_cardinality, count_distinct(requestTime) as requestTime_cardinality, count_distinct(requestTimeEpoch) as requestTimeEpoch_cardinality, count_distinct(responseLatency) as responseLatency_cardinality, count_distinct(responseLength) as responseLength_cardinality, count_distinct(routeKey) as routeKey_cardinality, count_distinct(stage) as stage_cardinality, count_distinct(status) as status_cardinality'
def run_query(query):
start = 1722276000000 # Converted epoch time from date (Mon Jul 29 11:00:00 PDT 2024)
end = 1724695200000 # Converted epoch time from date (Mon Aug 26 11:00:00 PDT 2024)
group_names = ['/aws/apigateway/azul-service-prod', '/aws/apigateway/azul-indexer-prod']
try:
request = logs.start_query(logGroupNames=group_names,
startTime=start,
endTime=end,
queryString=query,
limit=10)
except logs.exceptions.LimitExceededException:
# Wait a little, let the queued-up queries complete (w/o ongoing concurrent requests, < 40s)
print('Sleeping to let some of the queued up queries complete')
time.sleep(20)
request = run_query(query)
return request
cardinality_fields = run_query(query=cardinality_q)
print('Running cardinality query, this will take a minute…')
time.sleep(60) # Request matches 1,439,294 records in CW Insights console
cardinality_fields = logs.get_query_results(queryId=cardinality_fields['queryId'])
api_gateway_fields = {}
for field in cardinality_fields['results'][0]: # … so only interate over the 51 (fields)
field_name, _ = field['field'].split('_cardinality')
print('Executing top ten values query for field %r' % field_name)
field_cardinality = field['value']
top_ten_query = f'stats count(*) as count by {field_name} | sort count desc'
query = run_query(top_ten_query)
api_gateway_fields[field_name] = {'cardinality': field_cardinality, 'top_ten': query['queryId']}
# Sleep to prevent overwhelming CloudWatch with queries
time.sleep(5)
print('Waiting for top ten queries to complete, this will take a minute…')
time.sleep(60)
for field_keys in api_gateway_fields.values():
top_ten = logs.get_query_results(queryId=field_keys['top_ten'])
field_keys['top_ten'] = [v[0]['value'] for v in top_ten['results']]
print(json.dumps(api_gateway_fields, indent=4)) This may be ran from Azul's project root. |
Based on those result we should be able to remove the following fields from the logs: {
"authorizer_claims_property": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"authorizer_error": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"authorizer_principalId": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"authorizer_property": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"awsEndpointRequestId2": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"customDomain_basePathMatched": {
"cardinality": "1",
"top_ten": [
"(none)"
]
},
"dataProcessed": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_accountId": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_caller": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_cognitoAuthenticationProvider": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_cognitoAuthenticationType": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_cognitoIdentityId": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_cognitoIdentityPoolId": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_principalOrgId": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_clientCertPem": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_subjectDN": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_issuerDN": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_serialNumber": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_validity_notBefore": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_clientCert_validity_notAfter": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_user": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"identity_userArn": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"integrationErrorMessage": {
"cardinality": "1",
"top_ten": [
"-"
]
},
"routeKey": {
"cardinality": "1",
"top_ten": [
"-"
]
}
}
|
Assignee to remove the fields listed above. |
For demo, show absence of fields from new log entries in API Gateway logs for indexer and service. |
… for example the
authorizer_…
and many ofidentity…
fields.The text was updated successfully, but these errors were encountered: