batch delete from gc #5404

trinity-1686a · 2024-09-09T13:59:44Z

Description

follow up to #5380
if a cluster generate less than 1k split per 10 minutes, everything was fine, but in case it would generate more than that, we have a slow down for 2 reasons:

we no longer run 10 tasks in parallel
we may get a batch of 1k splits from many different indexes, and get a 2nd batch from the same set of indexes, meaning more than one delete per index, despite having few splits to delete for each indexes. This can in theory cause us to do up to n_index time more delete calls to the metastore than necessary (caped to 1k)

Changes made:

we run up to 10 deletion from storage + metastore concurrently
we retrieve splits sorted by index_uid from metastore, so we don't do many delete per index
we query for 10k splits instead of 1k: before we would query for 10 * 1k, but as this is now batched, we can increase this number to keep a similar effect to before

How was this PR tested?

tested on a small cluster to not be utterly broken, for perf improvement, we'll need a way bigger cluster

trinity-1686a · 2024-09-09T14:03:46Z

quickwit/quickwit-metastore/src/metastore/postgres/utils.rs

+            sql.join(
+                JoinType::Join,
+                Indexes::Table,
+                Expr::col((Splits::Table, Splits::IndexUid))
+                    .equals((Indexes::Table, Indexes::IndexUid)),
+            )


note: i think this is a bad idea to do this long term, but i'd rather this fix be made without a migration for now, so we can easily revert clusters affected to previous versions if we find things are somehow still not good. If you disagree, we can either create the required index manually, or make a migration

github-actions · 2024-09-09T14:35:46Z

On SSD:

Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3358, ref commit: ec951aa
Link

On GCS:

Average search latency is 1.2x that of the reference (lower is better).
Ref run id: 3359, ref commit: ec951aa
Link

quickwit/quickwit-index-management/src/garbage_collection.rs

trinity-1686a added 2 commits September 9, 2024 10:46

better batching of delete operations

ffbeb5b

run deletion from gc concurrently

2937cbb

trinity-1686a requested a review from fulmicoton September 9, 2024 13:59

trinity-1686a commented Sep 9, 2024

View reviewed changes

trinity-1686a commented Sep 10, 2024

View reviewed changes

quickwit/quickwit-index-management/src/garbage_collection.rs Outdated Show resolved Hide resolved

fulmicoton and others added 2 commits September 10, 2024 17:10

refactoring attempt

eb60ebe

rustfmt and clippy

b426d80

fulmicoton approved these changes Sep 10, 2024

View reviewed changes

Merge branch 'main' into trinity/batch-delete

6ff6430

fulmicoton merged commit 20b4956 into main Sep 10, 2024
5 checks passed

fulmicoton deleted the trinity/batch-delete branch September 10, 2024 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch delete from gc #5404

batch delete from gc #5404

trinity-1686a commented Sep 9, 2024

trinity-1686a Sep 9, 2024

github-actions bot commented Sep 9, 2024 •

edited

Loading

batch delete from gc #5404

batch delete from gc #5404

Conversation

trinity-1686a commented Sep 9, 2024

Description

How was this PR tested?

trinity-1686a Sep 9, 2024

Choose a reason for hiding this comment

github-actions bot commented Sep 9, 2024 • edited Loading

On SSD:

On GCS:

github-actions bot commented Sep 9, 2024 •

edited

Loading