Vaccum/ Merge small file command #87

zhousun · 2025-01-16T22:15:13Z

What feature are you requesting?

Currently, each write statement will create a parquet file, so if inserts are not batched, columnstore tables can have many small files (which will cause performance and S3 cost problems)

Current workaround is to run a no-op update query: UPDATE T set a=a, this will set your table to optimal state.

However, a built-in vacuum or compaction command that merges small files into larger ones would be ideal.

Why are you requesting this feature?

Optimize a table.

What is your proposed implementation for this feature?

Hook the vaccum command. Find small parquet files and then use the update/delete code path to rewrite them.

nbiscaro · 2025-01-26T22:55:08Z

I can grab this!

Noting it may also make sense to implement our deletion vector file compaction here when they eventually move out of heap tables.

dpxcc · 2025-01-28T01:16:46Z

Discussed on Slack. @nbiscaro will be working on #89 instead

zhousun added feature good first issue Good for newcomers labels Jan 16, 2025

dpxcc assigned nbiscaro and unassigned nbiscaro Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vaccum/ Merge small file command #87

Vaccum/ Merge small file command #87

zhousun commented Jan 16, 2025

nbiscaro commented Jan 26, 2025 •

edited

Loading

dpxcc commented Jan 28, 2025

Vaccum/ Merge small file command #87

Vaccum/ Merge small file command #87

Comments

zhousun commented Jan 16, 2025

What feature are you requesting?

Why are you requesting this feature?

What is your proposed implementation for this feature?

nbiscaro commented Jan 26, 2025 • edited Loading

dpxcc commented Jan 28, 2025

nbiscaro commented Jan 26, 2025 •

edited

Loading