Skip to content

Commit

Permalink
Handle NotInitialized::ShuttingDown error in shard split (#8506)
Browse files Browse the repository at this point in the history
There is a race condition between timeline shutdown and the split task.
Timeline shutdown first shuts down the upload queue, and only then fires
the cancellation token. A parallel running timeline split operation
might thus encounter a cancelled upload queue before the cancellation
token is fired, and print a noisy error.

Fix this by mapping `anyhow::Error{ NotInitialized::ShuttingDown }) to
`FlushLayerError::Cancelled` instead of `FlushLayerError::Other(_)`.


Fixes #8496
  • Loading branch information
arpad-m authored Jul 26, 2024
1 parent 857a182 commit 8e02db1
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions pageserver/src/tenant/timeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ use self::layer_manager::LayerManager;
use self::logical_size::LogicalSize;
use self::walreceiver::{WalReceiver, WalReceiverConf};

use super::config::TenantConf;
use super::{config::TenantConf, upload_queue::NotInitialized};
use super::{debug_assert_current_span_has_tenant_and_timeline_id, AttachedTenantConf};
use super::{remote_timeline_client::index::IndexPart, storage_layer::LayerFringe};
use super::{remote_timeline_client::RemoteTimelineClient, storage_layer::ReadableLayer};
Expand Down Expand Up @@ -642,7 +642,13 @@ impl FlushLayerError {
// When crossing from generic anyhow errors to this error type, we explicitly check
// for timeline cancellation to avoid logging inoffensive shutdown errors as warn/err.
fn from_anyhow(timeline: &Timeline, err: anyhow::Error) -> Self {
if timeline.cancel.is_cancelled() {
let cancelled = timeline.cancel.is_cancelled()
// The upload queue might have been shut down before the official cancellation of the timeline.
|| err
.downcast_ref::<NotInitialized>()
.map(NotInitialized::is_stopping)
.unwrap_or_default();
if cancelled {
Self::Cancelled
} else {
Self::Other(Arc::new(err))
Expand Down

1 comment on commit 8e02db1

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3132 tests run: 3011 passed, 0 failed, 121 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_sharding_split_compaction[None]: release

Code coverage* (full report)

  • functions: 32.7% (7002 of 21414 functions)
  • lines: 50.1% (55805 of 111282 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
8e02db1 at 2024-07-26T01:09:58.861Z :recycle:

Please sign in to comment.