Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat] [gcp] add billing additional fields #36902

Merged
merged 13 commits into from
Nov 3, 2023

Conversation

gpop63
Copy link
Contributor

@gpop63 gpop63 commented Oct 18, 2023

Overview

Added functionality to check if a billing data table is detailed, using the prefix gcp_billing_export_resource_v1_ for identification.

Added a row struct in the code to manage billing data fields, replacing the previous method that accessed fields by indexes. This change makes the data handling clearer and reduces the likelihood of errors due to misindexing.

Previously, max aggregation displayed daily costs on the billing dashboard for one document per day. With multiple documents now generated daily, sum aggregation is needed to accurately add the total cost from all documents.

Table formats:

  • Standard cost usage table format: gcp_billing_export_v1_<BILLING_ACCOUNT_ID>
  • Detailed cost usage table format: gcp_billing_export_resource_v1_<BILLING_ACCOUNT_ID>

Common fields: Additional fields applicable to both standard and detailed cost data:

  • sku.id
  • sku.description
  • service.id
  • service.description
  • tags

Detailed fields:

  • price.effective_price

Documentation:

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 18, 2023
@mergify mergify bot assigned gpop63 Oct 18, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 18, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gpop63? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@gpop63 gpop63 requested a review from ishleenk17 October 18, 2023 23:06
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 18, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 28 min 6 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

return createDetailedQuery(tableName, month, costType)
}

return createStandardQuery(tableName, month, costType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add an else if table doesn't match any pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table pattern will always match the standard pattern even if user forgets to set it.

if m.config.TablePattern == "" {
m.logger.Warn("table_pattern is not set in config, \"gcp_billing_export_v1\" will be used by default.")
m.config.TablePattern = "gcp_billing_export_v1"
}

@ishleenk17 ishleenk17 requested a review from zmoog October 19, 2023 08:20
@gpop63 gpop63 force-pushed the add_gcp-billing_fields branch from b7af6f9 to 7c51bc8 Compare October 19, 2023 09:55
events = append(events, createEvents(row, m.config.ProjectID))
case DetailedTableRowLen:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This casing would be better if done depending upon the type of table being queried rather than the number of rows ?

@gpop63 gpop63 force-pushed the add_gcp-billing_fields branch from f61ae91 to 6a801bc Compare October 19, 2023 20:38
@gpop63
Copy link
Contributor Author

gpop63 commented Oct 19, 2023

Standard document example:

{
  "_index": ".ds-metricbeat-8.12.0-2023.10.19-000001",
  "_id": "eh2USYsBN2wa5YEjwRKW",
  "_score": 1,
  "_source": {
    "@timestamp": "2023-10-19T20:16:01.524Z",
    "metricset": {
      "name": "billing",
      "period": 60000
    },
    "service": {
      "type": "gcp"
    },
    "gcp": {
      "billing": {
        "service_description": "Compute Engine",
        "invoice_month": "202310",
        "project_id": "robust-catalyst-399814",
        "billing_account_id": "011702-58A742-BEB4E7",
        "total": 0,
        "sku_id": "0D56-2F80-52A5",
        "cost_type": "regular",
        "service_id": "6F81-5844-456A",
        "sku_description": "Network Inter Region Ingress from Jakarta to Americas",
        "tags": {},
        "project_name": "My Project 70130"
      }
    },
    "cloud.project.id": "robust-catalyst-399814",
    "cloud.project.name": "My Project 70130",
    "host": {
      "os": {
        "platform": "ubuntu",
        "version": "22.04.2 LTS (Jammy Jellyfish)",
        "family": "debian",
        "name": "Ubuntu",
        "kernel": "5.10.102.1-microsoft-standard-WSL2",
        "codename": "jammy",
        "type": "linux"
      },
      "containerized": false,
      "ip": [
        "172.30.189.193",
        "fe80::215:5dff:fecc:d3ab"
      ],
      "mac": [
        "00-15-5D-CC-D3-AB",
        "5A-2B-CE-65-CF-33",
        "62-74-42-62-2C-60"
      ],
      "name": "gab",
      "hostname": "gab",
      "architecture": "x86_64"
    },
    "cloud.account.id": "011702-58A742-BEB4E7",
    "event": {
      "duration": 3895440085,
      "dataset": "gcp.billing",
      "module": "gcp"
    },
    "cloud.provider": "gcp",
    "ecs": {
      "version": "8.0.0"
    },
    "agent": {
      "id": "afdc4ab4-827a-4058-97e7-267618840f1d",
      "name": "gab",
      "type": "metricbeat",
      "version": "8.12.0",
      "ephemeral_id": "a071aaea-4ff5-4ec7-b7b0-672f358d2438"
    }
  }
}

Detailed document example:

{
  "_index": ".ds-metricbeat-8.12.0-2023.10.19-000001",
  "_id": "3h2RSYsBN2wa5YEjUBFz",
  "_score": 1,
  "_source": {
    "@timestamp": "2023-10-19T20:12:17.902Z",
    "event": {
      "dataset": "gcp.billing",
      "module": "gcp",
      "duration": 3092063008
    },
    "metricset": {
      "period": 60000,
      "name": "billing"
    },
    "service": {
      "type": "gcp"
    },
    "host": {
      "ip": [
        "172.30.189.193",
        "fe80::215:5dff:fecc:d3ab"
      ],
      "mac": [
        "00-15-5D-CC-D3-AB",
        "5A-2B-CE-65-CF-33",
        "62-74-42-62-2C-60"
      ],
      "name": "gab",
      "hostname": "gab",
      "architecture": "x86_64",
      "os": {
        "name": "Ubuntu",
        "kernel": "5.10.102.1-microsoft-standard-WSL2",
        "codename": "jammy",
        "type": "linux",
        "platform": "ubuntu",
        "version": "22.04.2 LTS (Jammy Jellyfish)",
        "family": "debian"
      },
      "containerized": false
    },
    "gcp": {
      "billing": {
        "effective_price": 0.00292353,
        "tags": {
          "compute.googleapis.com/machine_spec": "e2-medium",
          "compute.googleapis.com/memory": "4096",
          "compute.googleapis.com/cores": "2"
        },
        "billing_account_id": "011702-58A742-BEB4E7",
        "cost_type": "regular",
        "sku_id": "F449-33EC-A5EF",
        "sku_description": "E2 Instance Ram running in Americas",
        "service_description": "Compute Engine",
        "invoice_month": "202310",
        "project_name": "My Project 70130",
        "service_id": "6F81-5844-456A",
        "project_id": "robust-catalyst-399814",
        "total": 1.700642
      }
    },
    "cloud.provider": "gcp",
    "cloud.project.id": "robust-catalyst-399814",
    "cloud.account.id": "011702-58A742-BEB4E7",
    "agent": {
      "ephemeral_id": "dac27eeb-f92c-43be-a0e6-ad3863bc5a27",
      "id": "afdc4ab4-827a-4058-97e7-267618840f1d",
      "name": "gab",
      "type": "metricbeat",
      "version": "8.12.0"
    },
    "ecs": {
      "version": "8.0.0"
    },
    "cloud.project.name": "My Project 70130"
  }
}

@gpop63 gpop63 added the Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team label Oct 19, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 19, 2023
@gpop63 gpop63 marked this pull request as ready for review October 19, 2023 22:24
@gpop63 gpop63 requested a review from a team as a code owner October 19, 2023 22:24
Implemented a check to differentiate between standard and detailed cost usage tables
Refactoring the 'row' structure used for storing BigQuery results, optimizing data handling
Added support for tags
Introduced new cost usage data fields
fix older tests
@gpop63 gpop63 force-pushed the add_gcp-billing_fields branch from 54e857e to 61e984a Compare October 22, 2023 19:44
lucian-ioan
lucian-ioan previously approved these changes Oct 23, 2023
@lucian-ioan lucian-ioan dismissed their stale review October 23, 2023 11:24

SQL query is not appropriate for billing data

Copy link
Contributor

@zmoog zmoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM, but we also need to update the docs.

We need to explain the "standard" vs. "detailed" tables and how to set up the metricset to get one or the other.

We should highlight which extra fields you get from the detailed table.

We should show a sample document and configuration for both standard and detailed cases at https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-gcp-billing.html

Copy link
Contributor

@zmoog zmoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docs before merging the PR.

@gpop63
Copy link
Contributor Author

gpop63 commented Oct 23, 2023

The original implementation grouped the data by few fields, resulting in a single document every 24 hours. The updated implementation allows for a more granular view of the data, with each document containing detailed information about each service, SKU etc.

In total there are 70 rows, but I only copied the ones that have total_exact bigger than 0, meaning that there would be 70 documents created in elasticsearch.

Updated implementation

Row invoice_month project_id project_name billing_account_id cost_type sku_id sku_description service_id service_description effective_price tags_string total_exact
1 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular CF4E-A0C7-E3BF E2 Instance Core running in Americas 6F81-5844-456A Compute Engine 0.02181159 compute.googleapis.com/cores:2,compute.googleapis.com/machine_spec:e2-medium,compute.googleapis.com/memory:4096 3.216356
2 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular F449-33EC-A5EF E2 Instance Ram running in Americas 6F81-5844-456A Compute Engine 0.00292353 compute.googleapis.com/cores:2,compute.googleapis.com/machine_spec:e2-medium,compute.googleapis.com/memory:4096 1.724385
3 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 6AE1-525F-8B80 Balanced PD Capacity 6F81-5844-456A Compute Engine 0.1   0.615075
4 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 9DE9-9092-B3BC Network Internet Egress from Americas to China 6F81-5844-456A Compute Engine 0.23   0.005543
5 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular DE9E-AFBC-A15A Network Inter Zone Egress 6F81-5844-456A Compute Engine 0.01   0.001505
6 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular FDBC-6E3B-D4D8 Network Internet Egress from Americas to Australia 6F81-5844-456A Compute Engine 0.19   0.000561
7 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular AED0-3315-7B11 Networking Traffic Egress GCP Replication within Northern America 95FF-2EF5-5EA1 Cloud Storage 0.02   0.000104
8 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 0D5D-6E23-4250 Standard Storage US Multi-region 95FF-2EF5-5EA1 Cloud Storage 0.026   1.9e-05
9 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 7C79-34BD-F61B Network Inter Region Egress from Americas to Jakarta 6F81-5844-456A Compute Engine 0.15   1.5e-05
10 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 984A-1F27-2D1F Network Egress via Carrier Peering Network - Americas Based 6F81-5844-456A Compute Engine 0.04   1.3e-05
11 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 9ADA-9AED-1B24 Multi-Region Standard Class A Operations 95FF-2EF5-5EA1 Cloud Storage 0.01   1e-05

Current implementation

Row invoice_month project_id project_name billing_account_id cost_type total_exact
1 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 5.563586

@gpop63 gpop63 force-pushed the add_gcp-billing_fields branch from 40f1842 to eb995a7 Compare October 24, 2023 19:25
@ishleenk17
Copy link
Contributor

The original implementation grouped the data by few fields, resulting in a single document every 24 hours. The updated implementation allows for a more granular view of the data, with each document containing detailed information about each service, SKU etc.

In total there are 70 rows, but I only copied the ones that have total_exact bigger than 0, meaning that there would be 70 documents created in elasticsearch.

Updated implementation
Current implementation
Row invoice_month project_id project_name billing_account_id cost_type total_exact
1 202310 robust-catalyst-399814 My Project 70130 011702-58A742-BEB4E7 regular 5.563586

@gpop63 : This is because we are now also grouping by sku.id, service.id etc fields in the query?

@gpop63
Copy link
Contributor Author

gpop63 commented Oct 25, 2023

@gpop63 : This is because we are now also grouping by sku.id, service.id etc fields in the query?

@ishleenk17 Yes, adding more fields to the GROUP BY clause, like sku.id and service.id, increases the granularity of the grouping. This creates more unique combinations to group by, hence resulting in a higher number of rows in the query results.

@ishleenk17
Copy link
Contributor

@gpop63 : This is because we are now also grouping by sku.id, service.id etc fields in the query?

@ishleenk17 Yes, adding more fields to the GROUP BY clause, like sku.id and service.id, increases the granularity of the grouping. This creates more unique combinations to group by, hence resulting in a higher number of rows in the query results.

Thanks, I hope this is inline with the grouping activity going on for the TSDB front.

@ishleenk17
Copy link
Contributor

Shall we create a separate ticket for changes to the dashboards?
Some additions to the dashboards:

  • lens visualisations where a customer can view cost/ sku or cost/service
  • Filtering by tags?
  • Effective Price view

Also, some revamping would be needed to the existing dashboard.
Maybe as part of this PR we can ensure that nothing breaks in the current dashboard.

And add the other visualisations to the dashboard as part of another PR.

@zmoog : Thoughts ?

@zmoog
Copy link
Contributor

zmoog commented Oct 27, 2023

Shall we create a separate ticket for changes to the dashboards?

Yeah, I would make sure the existing dashboard continues to work as expected. We can leverage the new fields in a different PR.

@mergify
Copy link
Contributor

mergify bot commented Oct 27, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b add_gcp-billing_fields upstream/add_gcp-billing_fields
git merge upstream/main
git push upstream add_gcp-billing_fields

@ishleenk17
Copy link
Contributor

@gpop63 : Lets create a separate ticket for creating updated dashbaords with the new change.
Can you please point out what is not working in the current dashboard with the new changes.

@ishleenk17
Copy link
Contributor

@gpop63 : Changing the nested to array will bring a change in the geenrated event as well. Lets update that as well.
Also, please update the dashboard

@zmoog
Copy link
Contributor

zmoog commented Oct 31, 2023

@gpop63, why are we changing the tags structure and the aggregation type for the gcp.billing.total field (isn't this one of the preexisting fields)?

@gpop63
Copy link
Contributor Author

gpop63 commented Oct 31, 2023

@zmoog If tags field is an object, adding an input control group in the dashboard for filtering based on tags keys and/or values is not possible due to the dynamic mapping.

The field definition below enables filtering by tags keys and/or values because there is a mapping which allows us to choose from either gcp.billing.tags.key or gcp.billing.tags.value in the input control group. I provided more details here.

- name: tags
  type: nested
  fields:
  - name: key
    type: keyword
  - name: value
    type: keyword

@zmoog
Copy link
Contributor

zmoog commented Nov 2, 2023

@zmoog If tags field is an object, adding an input control group in the dashboard for filtering based on tags keys and/or values is not possible due to the dynamic mapping.

The field definition below enables filtering by tags keys and/or values because there is a mapping which allows us to choose from either gcp.billing.tags.key or gcp.billing.tags.value in the input control group. I provided more details here.

Thanks for the clarification.

@zmoog
Copy link
Contributor

zmoog commented Nov 2, 2023

@gpop63, is this PR going to impact elastic/integrations#5665 or make it obsolete?

Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make the PR slight more descriptive regarding the changes in the PR.
Otherwise, looks good!

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 3, 2023

@gpop63, is this PR going to impact elastic/integrations#5665 or make it obsolete?

@zmoog Yeah, kind of. This PR probably made that one obsolete. Also the aggregation change to last_value will not work since we have to use sum after this change.

minor changes
@gpop63
Copy link
Contributor Author

gpop63 commented Nov 3, 2023

/test

@gpop63 gpop63 merged commit 5b77684 into elastic:main Nov 3, 2023
8 checks passed
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* GCP Billing Enhancement
Implemented a check to differentiate between standard and detailed cost usage tables
Refactoring the 'row' structure used for storing BigQuery results, optimizing data handling
Added support for tags
Introduced new cost usage data fields

* add tests
fix older tests

* add changelog entry

* update docs and fields

* add sku and service ids in hash

* improve docs

* update fields types

* use array instead of map for tags
include tags in the event signature

* update sample doc

* use sum instead of max for dashboards

* fix label
minor changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional Metrics in GCP Billing Module
6 participants