-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lost concurrent CAgg updates #6443
Merged
jnidzwetzki
merged 1 commit into
timescale:main
from
jnidzwetzki:concurrent_cagg_refresh
Jan 2, 2024
Merged
Fix lost concurrent CAgg updates #6443
jnidzwetzki
merged 1 commit into
timescale:main
from
jnidzwetzki:concurrent_cagg_refresh
Jan 2, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
23b1770
to
5ad1972
Compare
jnidzwetzki
commented
Dec 19, 2023
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #6443 +/- ##
==========================================
- Coverage 87.33% 87.31% -0.02%
==========================================
Files 187 187
Lines 41869 41834 -35
Branches 9320 9304 -16
==========================================
- Hits 36567 36529 -38
Misses 3626 3626
- Partials 1676 1679 +3 ☔ View full report in Codecov by Sentry. |
6434b30
to
21dc711
Compare
@gayyappan, @fabriziomello: please review this pull request.
|
d3247b0
to
1702083
Compare
1702083
to
3694cc4
Compare
When two CAggs on the same hypertable are refreshed at the same type, we had a race condition on the invalidation threshold table occur. So far, the table has been locked with a non-self-conflicting lock. Therefore, both scanners ran at the same time, but only one was able to lock the threshold value with a proper tuple lock. The other scanner ignored this failing lock and just returned. Therefore, the field computed_invalidation_threshold was never populated and still contains 0. So, invalidation_threshold_set_or_get returns and refresh end interval of 0. As a consequence, the `if (refresh_window.start >= refresh_window.end)` branch in continuous_agg_refresh_internal could be taken and we return from the refresh without doing any work. This patch adds proper error reporting and also implements some retry logic to avoid these problems. A self-conficting lock is not used due to the problems discussed in timescale#5809.
3694cc4
to
6df2989
Compare
svenklemm
approved these changes
Jan 2, 2024
antekresic
approved these changes
Jan 2, 2024
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 3, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
Merged
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 3, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 3, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 3, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 4, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 9, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 9, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 9, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
to jnidzwetzki/timescaledb
that referenced
this pull request
Jan 9, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki
added a commit
that referenced
this pull request
Jan 9, 2024
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6405 Read CAgg watermark from materialized data * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When two CAggs on the same hypertable are refreshed at the same type, we had a race condition on the invalidation threshold table occur.
So far, the table has been locked with a non-self-conflicting lock. Therefore, both scanners ran at the same time, but only one was able to lock the threshold value with a proper tuple lock. The other scanner ignored this failing lock and just returned. Therefore, the field computed_invalidation_threshold was never populated and still contains 0.
So, invalidation_threshold_set_or_get returns and refresh end interval of 0. As a consequence, the
if (refresh_window.start >= refresh_window.end)
branch in continuous_agg_refresh_internal could be taken and we return from the refresh without doing any work.This patch adds proper error reporting and also implements some retry logic (inspired by
RelationFindReplTupleSeq
) to avoid these problems. A self-conficting lock is not used due to the problems discussed in #5809.