Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vaccum/ Merge small file command #87

Open
zhousun opened this issue Jan 16, 2025 · 2 comments
Open

Vaccum/ Merge small file command #87

zhousun opened this issue Jan 16, 2025 · 2 comments
Labels
feature good first issue Good for newcomers

Comments

@zhousun
Copy link
Contributor

zhousun commented Jan 16, 2025

What feature are you requesting?

Currently, each write statement will create a parquet file, so if inserts are not batched, columnstore tables can have many small files (which will cause performance and S3 cost problems)

Current workaround is to run a no-op update query: UPDATE T set a=a, this will set your table to optimal state.

However, a built-in vacuum or compaction command that merges small files into larger ones would be ideal.

Why are you requesting this feature?

Optimize a table.

What is your proposed implementation for this feature?

Hook the vaccum command. Find small parquet files and then use the update/delete code path to rewrite them.

@zhousun zhousun added feature good first issue Good for newcomers labels Jan 16, 2025
@nbiscaro
Copy link
Contributor

nbiscaro commented Jan 26, 2025

I can grab this!

Noting it may also make sense to implement our deletion vector file compaction here when they eventually move out of heap tables.

@dpxcc dpxcc assigned nbiscaro and unassigned nbiscaro Jan 27, 2025
@dpxcc
Copy link
Contributor

dpxcc commented Jan 28, 2025

Discussed on Slack. @nbiscaro will be working on #89 instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants