Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: When OOO native histograms are disabled it should be a client error #9567

Merged
merged 4 commits into from
Oct 8, 2024

Conversation

gotjosh
Copy link
Contributor

@gotjosh gotjosh commented Oct 8, 2024

What this PR does

When ingesters replay data from Kafka, if they receive a non-client error, they will continue to attempt to push the batch to the TSDB.

In this PR, I'm translating Native Histogram OOO errors into client errors to ensure that we can continue with the next set of write requests when replaying data.

This was introduced as part of #7175

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

When ingesters replay data from Kafka, if they receive a non-client error they will continue to attempt to push the batch to the TSDB.

In this PR, I'm translating Native Histogram OOO errors into client errors to ensure that we can continue with the next set of write requests when replaying data.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh requested a review from a team as a code owner October 8, 2024 15:54
@gotjosh gotjosh requested review from fionaliao and krajorama October 8, 2024 15:55
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh requested a review from tacole02 as a code owner October 8, 2024 17:08
Copy link
Contributor

@fionaliao fionaliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm having a look at why the OOO integration test didn't catch this issue. I think it's a problem with the test rather than the code, so I think that can be investigated separately.

docs/sources/mimir/manage/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
gotjosh and others added 2 commits October 8, 2024 18:46
Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh merged commit f737a53 into main Oct 8, 2024
29 checks passed
@gotjosh gotjosh deleted the turn-ooo-native-histograms-error-into-client-errors branch October 8, 2024 18:26
grafanabot pushed a commit that referenced this pull request Oct 8, 2024
… error (#9567)

* When OOO native histograms are disabled it should be a client error

When ingesters replay data from Kafka, if they receive a non-client error they will continue to attempt to push the batch to the TSDB.

In this PR, I'm translating Native Histogram OOO errors into client errors to ensure that we can continue with the next set of write requests when replaying data.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Address review feedback

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/mimir/manage/mimir-runbooks/_index.md

Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>

* lint

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
(cherry picked from commit f737a53)
gotjosh added a commit that referenced this pull request Oct 8, 2024
… error (#9567) (#9569)

* When OOO native histograms are disabled it should be a client error

When ingesters replay data from Kafka, if they receive a non-client error they will continue to attempt to push the batch to the TSDB.

In this PR, I'm translating Native Histogram OOO errors into client errors to ensure that we can continue with the next set of write requests when replaying data.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Address review feedback

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/mimir/manage/mimir-runbooks/_index.md

Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>

* lint

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
(cherry picked from commit f737a53)

Co-authored-by: gotjosh <josue.abreu@gmail.com>
seizethedave added a commit that referenced this pull request Oct 16, 2024
This applies the same logic from #9567 to block-builder. This currently has the effect of stopping block-builder ingestion when an OOO native histogram is encountered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants