Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: improve the performance of Juicefs gc command #5683

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SonglinLife
Copy link

Improve the file deletion performance by processing multiple files in parallel

ref: #5671

Improve the file deletion performance by processing multiple files in parallel

ref: juicedata#5671
@CLAassistant
Copy link

CLAassistant commented Feb 19, 2025

CLA assistant check
All committers have signed the CLA.

@SonglinLife
Copy link
Author

I have submitted a pull request for issue #5671, which aims to improve the performance of the gc command, especially when handling a large number of small files. I would appreciate it if someone could take a moment to review it.

pkg/meta/tkv.go Outdated
return err
batchSize := 100000

threads := m.conf.MaxDeletes / 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a new parameter makes it clearer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment, I will add a new flag scan-threads in cmd/gc.go and a corresponding variable CleanupScanThreads in the config.

startKey := m.fmtKey("D")
endKey := nextKey(startKey)
for {
keys, values, err := m.scan(startKey, endKey, batchSize, func(k, v []byte) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use client.scan directly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.scan is a full scan method which retrieves all pending delete files, so it may not suitable for handling a large number of small files.

return fmt.Errorf("invalid key %x", key)
for i := 0; i < threads; i++ {
wg.Add(1)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception occurs in the middle, the GC command should print that, same for redis and sql implement.

@@ -70,6 +70,11 @@ $ juicefs gc redis://localhost --delete`,
Value: 10,
Usage: "number threads to delete leaked objects",
},
&cli.IntFlag{
Name: "cleanup-threads",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse the threads for cleanup

deleteFileChan := make(chan redis.Z, threads)
var wg sync.WaitGroup

for i := 0; i < threads; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this part into base.go to reduce the duplicated code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants