Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many uninformative fields in API Gateway access logs #6399

Closed
hannes-ucsc opened this issue Jul 11, 2024 · 7 comments
Closed

Many uninformative fields in API Gateway access logs #6399

hannes-ucsc opened this issue Jul 11, 2024 · 7 comments
Assignees
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:3 [process] Spike estimate of three points

Comments

@hannes-ucsc
Copy link
Member

… for example the authorizer_… and many of identity… fields.

@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Jul 11, 2024
@achave11-ucsc
Copy link
Member

Assignee to consider next steps.

@hannes-ucsc
Copy link
Member Author

Assignee to compile a list of fields, their cardinality and the top ten most frequent values for that field, for prod and the past four weeks. The result must be posted on Slack. A link to the Slack post must be posted here as a comment to this issue. The JSON must have the following structure:

{
    "statusCode": {
        "cardinality": 4,
        "top_ten": ["200", "400", "429", "500"]
    },
    "size": {
        "cardinality": 123456,
        "top_ten": [0, 1287, 71389, 57, 6947, 32, 9514, 3098730, 5527, 372]
    },
    
}

Assignee to document in this issue the method used to obtain the result. If a script is used, the script source should be posted here.

@hannes-ucsc hannes-ucsc removed their assignment Aug 26, 2024
@hannes-ucsc hannes-ucsc added the spike:3 [process] Spike estimate of three points label Aug 26, 2024
@achave11-ucsc
Copy link
Member

Spike results posted to Slack.

@achave11-ucsc
Copy link
Member

The following script is running Log Insights Queries under the hood, it was the only way to filter and leverage CloudWatch to perform the desired statistics, and most effectively get the desired results.

The first query determines the cardinality of each field by using the statistic count_distinct for each field (see count_distinct in AWS documentation for more info). This runs for the 51 fields mentioned in api_gateway.tf.json.template.py#L92-L142, which are the fields we configure for API Gateway access logs.

Then, to determine the top ten values for each field, the Log Insights query stats count(*) as count by {field_name} | sort count desc is used.

import json

from azul.deployment import aws
import time

logs = aws.client('logs')

cardinality_q = 'stats count_distinct(requestId) as requestId_cardinality, count_distinct(extendedRequestId) as extendedRequestId_cardinality, count_distinct(accountId) as accountId_cardinality, count_distinct(apiId) as apiId_cardinality, count_distinct(authorizer_claims_property) as authorizer_claims_property_cardinality, count_distinct(authorizer_error) as authorizer_error_cardinality, count_distinct(authorizer_principalId) as authorizer_principalId_cardinality, count_distinct(authorizer_property) as authorizer_property_cardinality, count_distinct(awsEndpointRequestId) as awsEndpointRequestId_cardinality, count_distinct(awsEndpointRequestId2) as awsEndpointRequestId2_cardinality, count_distinct(customDomain_basePathMatched) as customDomain_basePathMatched_cardinality, count_distinct(dataProcessed) as dataProcessed_cardinality, count_distinct(domainName) as domainName_cardinality, count_distinct(domainPrefix) as domainPrefix_cardinality, count_distinct(error_message) as error_message_cardinality, count_distinct(error_responseType) as error_responseType_cardinality, count_distinct(httpMethod) as httpMethod_cardinality, count_distinct(identity_accountId) as identity_accountId_cardinality, count_distinct(identity_caller) as identity_caller_cardinality, count_distinct(identity_cognitoAuthenticationProvider) as identity_cognitoAuthenticationProvider_cardinality, count_distinct(identity_cognitoAuthenticationType) as identity_cognitoAuthenticationType_cardinality, count_distinct(identity_cognitoIdentityId) as identity_cognitoIdentityId_cardinality, count_distinct(identity_cognitoIdentityPoolId) as identity_cognitoIdentityPoolId_cardinality, count_distinct(identity_principalOrgId) as identity_principalOrgId_cardinality, count_distinct(identity_clientCert_clientCertPem) as identity_clientCert_clientCertPem_cardinality, count_distinct(identity_clientCert_subjectDN) as identity_clientCert_subjectDN_cardinality, count_distinct(identity_clientCert_issuerDN) as identity_clientCert_issuerDN_cardinality, count_distinct(identity_clientCert_serialNumber) as identity_clientCert_serialNumber_cardinality, count_distinct(identity_clientCert_validity_notBefore) as identity_clientCert_validity_notBefore_cardinality, count_distinct(identity_clientCert_validity_notAfter) as identity_clientCert_validity_notAfter_cardinality, count_distinct(identity_sourceIp) as identity_sourceIp_cardinality, count_distinct(identity_user) as identity_user_cardinality, count_distinct(identity_userAgent) as identity_userAgent_cardinality, count_distinct(identity_userArn) as identity_userArn_cardinality, count_distinct(integration_error) as integration_error_cardinality, count_distinct(integration_integrationStatus) as integration_integrationStatus_cardinality, count_distinct(integration_latency) as integration_latency_cardinality, count_distinct(integration_requestId) as integration_requestId_cardinality, count_distinct(integration_status) as integration_status_cardinality, count_distinct(integrationErrorMessage) as integrationErrorMessage_cardinality, count_distinct(integrationLatency) as integrationLatency_cardinality, count_distinct(integrationStatus) as integrationStatus_cardinality, count_distinct(path) as path_cardinality, count_distinct(protocol) as protocol_cardinality, count_distinct(requestTime) as requestTime_cardinality, count_distinct(requestTimeEpoch) as requestTimeEpoch_cardinality, count_distinct(responseLatency) as responseLatency_cardinality, count_distinct(responseLength) as responseLength_cardinality, count_distinct(routeKey) as routeKey_cardinality, count_distinct(stage) as stage_cardinality, count_distinct(status) as status_cardinality'


def run_query(query):
    start = 1722276000000  # Converted epoch time from date (Mon Jul 29 11:00:00 PDT 2024)
    end = 1724695200000  # Converted epoch time from date (Mon Aug 26 11:00:00 PDT 2024)
    group_names = ['/aws/apigateway/azul-service-prod', '/aws/apigateway/azul-indexer-prod']
    try:
        request = logs.start_query(logGroupNames=group_names,
                                   startTime=start,
                                   endTime=end,
                                   queryString=query,
                                   limit=10)
    except logs.exceptions.LimitExceededException:
        # Wait a little, let the queued-up queries complete (w/o ongoing concurrent requests, < 40s)
        print('Sleeping to let some of the queued up queries complete')
        time.sleep(20)
        request = run_query(query)
    return request


cardinality_fields = run_query(query=cardinality_q)
print('Running cardinality query, this will take a minute…')
time.sleep(60)  # Request matches 1,439,294 records in CW Insights console

cardinality_fields = logs.get_query_results(queryId=cardinality_fields['queryId'])
api_gateway_fields = {}
for field in cardinality_fields['results'][0]:  # … so only interate over the 51 (fields)
    field_name, _ = field['field'].split('_cardinality')
    print('Executing top ten values query for field %r' % field_name)
    field_cardinality = field['value']
    top_ten_query = f'stats count(*) as count by {field_name} | sort count desc'
    query = run_query(top_ten_query)
    api_gateway_fields[field_name] = {'cardinality': field_cardinality, 'top_ten': query['queryId']}
    # Sleep to prevent overwhelming CloudWatch with queries
    time.sleep(5)
print('Waiting for top ten queries to complete, this will take a minute…')
time.sleep(60)
for field_keys in api_gateway_fields.values():
    top_ten = logs.get_query_results(queryId=field_keys['top_ten'])
    field_keys['top_ten'] = [v[0]['value'] for v in top_ten['results']]

print(json.dumps(api_gateway_fields, indent=4))

This may be ran from Azul's project root.

@achave11-ucsc achave11-ucsc removed their assignment Aug 27, 2024
@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Aug 27, 2024

Based on those result we should be able to remove the following fields from the logs:

{
  "authorizer_claims_property": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "authorizer_error": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "authorizer_principalId": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "authorizer_property": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "awsEndpointRequestId2": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "customDomain_basePathMatched": {
    "cardinality": "1",
    "top_ten": [
      "(none)"
    ]
  },
  "dataProcessed": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_accountId": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_caller": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_cognitoAuthenticationProvider": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_cognitoAuthenticationType": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_cognitoIdentityId": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_cognitoIdentityPoolId": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_principalOrgId": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_clientCertPem": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_subjectDN": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_issuerDN": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_serialNumber": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_validity_notBefore": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_clientCert_validity_notAfter": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_user": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "identity_userArn": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "integrationErrorMessage": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  },
  "routeKey": {
    "cardinality": "1",
    "top_ten": [
      "-"
    ]
  }
}

[to_entries[]|select((.value.cardinality|tonumber==1) and (.value.top_ten[]|contains("-","(none)")))]|from_entries was the jq query used.

@dsotirho-ucsc dsotirho-ucsc added debt [type] A defect incurring continued engineering cost infra [subject] Project infrastructure like CI/CD, build and deployment scripts and removed orange [process] Done by the Azul team labels Aug 28, 2024
@github-actions github-actions bot added the orange [process] Done by the Azul team label Aug 28, 2024
@dsotirho-ucsc dsotirho-ucsc added bug [type] A defect preventing use of the system as specified and removed orange [process] Done by the Azul team labels Aug 28, 2024
@github-actions github-actions bot added the orange [process] Done by the Azul team label Aug 28, 2024
@dsotirho-ucsc
Copy link
Contributor

Assignee to remove the fields listed above.

@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Aug 29, 2024

For demo, show absence of fields from new log entries in API Gateway logs for indexer and service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified debt [type] A defect incurring continued engineering cost demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:3 [process] Spike estimate of three points
Projects
None yet
Development

No branches or pull requests

3 participants