Skip to content

fix(enriching): csv file enrichment tables no longer drop the first row #22257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 10, 2025

Conversation

B-Schmidt
Copy link
Contributor

Summary

  • Currently, file enrichment tables loaded from a csv file without headers loose the first row of data in that file
  • Fix this by keeping the first data row in a separate variable and then chaining it into the remaining data rows when parsing columns
  • Provide unit tests for loading a csv file both with and without header row

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

  • Unit tests are provided in the PR
  • Create data.csv
a,1
b,2
  • Create vector.yaml
enrichment_tables:
  data:
    type: file
    file:
      encoding:
        type: csv
        include_headers: false
      path: ./data.csv

sources:
  in:
    type: stdin

transforms:
  remap:
    inputs: [in]
    type: remap
    source: |
      .row = get_enrichment_table_record("data", {"0": .message}) ?? null
      . = {"message": .message, "row": .row}

sinks:
  out:
    type: console
    inputs: [remap]
    encoding:
      codec: json
  • Run vector -c vector.yaml providing the following input:
a
b
c
  • Expected output:
{"message": "a", "row": {"0": "a", "1": "1"}}
{"message": "b", "row": {"0": "b", "1": "2"}}
{"message": "c", "row": null}

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
    • make check-all is a good command to run locally. This check is
      defined here. Some of these
      checks might not be relevant to your PR. For Rust changes, at the very least you should run:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@B-Schmidt B-Schmidt requested a review from a team as a code owner January 20, 2025 15:42
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @B-Schmidt!

A more idiomatic way would be to use reader.records().peekable() but I can see how that can get complicated with the existing implementation.

@pront pront enabled auto-merge January 22, 2025 20:36
@pront pront changed the title fix(enrichment_tables): Fix csv file enrichment tables without header dropping the first row fix(enriching): Fix csv file enrichment tables without header dropping the first row Jan 22, 2025
@pront pront changed the title fix(enriching): Fix csv file enrichment tables without header dropping the first row fix(enriching): csv file enrichment tables no longer drop the first row Jan 22, 2025
auto-merge was automatically disabled February 10, 2025 12:02

Head branch was pushed to by a user without write access

@pront pront enabled auto-merge February 10, 2025 14:51
@pront pront added this pull request to the merge queue Feb 10, 2025
Merged via the queue into vectordotdev:master with commit da6886b Feb 10, 2025
42 checks passed
esensar pushed a commit to esensar/vector that referenced this pull request Feb 14, 2025
…ow (vectordotdev#22257)

* test loading enrichment tables

* fix headerless enrichment tables dropping the first row of data

* write changelog

* fix missing final newline in changelog
github-merge-queue bot pushed a commit that referenced this pull request Apr 3, 2025
…c set (#22409)

* feat(metrics): separate `expire_metrics_secs` configuration per metric set

Adds a new optional global option `expire_metrics_per_metric_set`, which enables configurable
`expire_metrics_secs` per metric set (defined with name and/or set of labels). `expire_metrics_secs`
configuration options is kept as a global default.

Closes: #19753

* Add missing docs for metrics_expiration

* Add custom `Configurable` implementation for `MetricLabelMatcherConfig`

* Fix `expire_metrics_per_metric_set` merging

* chore(deps): Bump heim from `4925b53` to `f3537d9` (#22399)

Bumps [heim](https://github.com/vectordotdev/heim) from `4925b53` to `f3537d9`.
- [Commits](vectordotdev/heim@4925b53...f3537d9)

---
updated-dependencies:
- dependency-name: heim
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump vrl from `2ccb98e` to `7612a8b` (#22401)

* chore(deps): Bump vrl from `2ccb98e` to `7612a8b`

Bumps [vrl](https://github.com/vectordotdev/vrl) from `2ccb98e` to `7612a8b`.
- [Commits](vectordotdev/vrl@2ccb98e...7612a8b)

---
updated-dependencies:
- dependency-name: vrl
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* dd-rust-license-tool write

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>

* chore(deps): Bump lru from 0.12.5 to 0.13.0 (#22397)

Bumps [lru](https://github.com/jeromefroe/lru-rs) from 0.12.5 to 0.13.0.
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md)
- [Commits](jeromefroe/lru-rs@0.12.5...0.13.0)

---
updated-dependencies:
- dependency-name: lru
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump bytes from 1.9.0 to 1.10.0 (#22404)

Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/bytes@v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump tempfile from 3.15.0 to 3.16.0 (#22400)

Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.15.0 to 3.16.0.
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](Stebalien/tempfile@v3.15.0...v3.16.0)

---
updated-dependencies:
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix(enriching): csv file enrichment tables no longer drop the first row (#22257)

* test loading enrichment tables

* fix headerless enrichment tables dropping the first row of data

* write changelog

* fix missing final newline in changelog

* chore(deps): Bump the patches group with 3 updates (#22393)

Bumps the patches group with 3 updates: [aws-types](https://github.com/smithy-lang/smithy-rs), [prost-reflect](https://github.com/andrewhickman/prost-reflect) and [hickory-proto](https://github.com/hickory-dns/hickory-dns).


Updates `aws-types` from 1.3.4 to 1.3.5
- [Release notes](https://github.com/smithy-lang/smithy-rs/releases)
- [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/smithy-lang/smithy-rs/commits)

Updates `prost-reflect` from 0.14.5 to 0.14.6
- [Changelog](https://github.com/andrewhickman/prost-reflect/blob/main/CHANGELOG.md)
- [Commits](andrewhickman/prost-reflect@0.14.5...0.14.6)

Updates `hickory-proto` from 0.24.2 to 0.24.3
- [Release notes](https://github.com/hickory-dns/hickory-dns/releases)
- [Changelog](https://github.com/hickory-dns/hickory-dns/blob/v0.24.3/CHANGELOG.md)
- [Commits](hickory-dns/hickory-dns@v0.24.2...v0.24.3)

---
updated-dependencies:
- dependency-name: aws-types
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: patches
- dependency-name: prost-reflect
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: patches
- dependency-name: hickory-proto
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: patches
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(ci): Bump docker/setup-qemu-action from 3.3.0 to 3.4.0 (#22412)

Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3.3.0 to 3.4.0.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](docker/setup-qemu-action@v3.3.0...v3.4.0)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(ci): Bump docker/setup-buildx-action from 3.8.0 to 3.9.0 (#22411)

Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3.8.0 to 3.9.0.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](docker/setup-buildx-action@v3.8.0...v3.9.0)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump data-encoding from 2.7.0 to 2.8.0 (#22402)

Bumps [data-encoding](https://github.com/ia0/data-encoding) from 2.7.0 to 2.8.0.
- [Commits](ia0/data-encoding@v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: data-encoding
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump hickory-proto from 0.24.2 to 0.24.3 in the cargo group (#22415)

Bumps the cargo group with 1 update: [hickory-proto](https://github.com/hickory-dns/hickory-dns).


Updates `hickory-proto` from 0.24.2 to 0.24.3
- [Release notes](https://github.com/hickory-dns/hickory-dns/releases)
- [Changelog](https://github.com/hickory-dns/hickory-dns/blob/v0.24.3/CHANGELOG.md)
- [Commits](hickory-dns/hickory-dns@v0.24.2...v0.24.3)

---
updated-dependencies:
- dependency-name: hickory-proto
  dependency-type: direct:production
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(ci): only run integration tests when secrets are available (#22414)

* fix(ci): remove integration test suite from PRs

* proper fix

* test commit

* Revert "test commit"

This reverts commit 8dc18e9.

* Fix docs issues

* Fix expire_secs docs issues

* Regenerate docs

* Add changelog entry

* Credit Quad9DNS in changelog

* Move changelog entry to `changelog.d` directory

* Update lib/vector-core/src/config/global_options.rs

Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>

* Simplify per metric set expiration secs

* Generate component docs

* Remove single variant

* Remove Single usage in tests

* Fix failing tests

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Co-authored-by: B-Schmidt <37669233+B-Schmidt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants