Skip to content

Fix database file size not shrinking when document_ttl triggers squash #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 21, 2025

Conversation

Darshan808
Copy link
Contributor

@Darshan808 Darshan808 commented Apr 30, 2025

Fixes y-crdt/pycrdt-websocket#116

Description

SQLite normally keeps freed pages in its internal freelist and only returns them to the OS when you rebuild the file. By turning on full auto-vacuum at database creation, any subsequent DELETE (including our TTL-based squash) will automatically release pages back to the OS—so the file will shrink as soon as old updates are purged.

Code changes

+ # enable full auto-vacuum & rebuild now
+ await cursor.execute("PRAGMA auto_vacuum = FULL")
+ await cursor.execute("VACUUM")

Verification

logging number of rows and db file size

# Updating after waiting for 2 seconds

Number of updates before: 124
SQLite file size before: 36864.00 Bytes
document_ttl:  1 

**Deleting history and squashing updates**

Number of updates after: 2
SQLite file size after: 36864.00 Bytes  # not immediately reflected in file size

Number of updates before: 2
SQLite file size before: 16384.00 Bytes # Size is reduced
document_ttl:  1 

@Darshan808
Copy link
Contributor Author

Should we also add logic to turn on full auto-vacuum and run an initial VACUUM for existing .db files?

@davidbrochart
Copy link
Collaborator

Should we also add logic to turn on full auto-vacuum and run an initial VACUUM for existing .db files?

I don't know if it's needed, actually a test would be helpful.

@krassowski krassowski added the bug Something isn't working label Apr 30, 2025
@Darshan808
Copy link
Contributor Author

Had to loop ~60 times in test since small updates don’t grow SQLite file size until a full page fills.

@Darshan808
Copy link
Contributor Author

Locally it passes with 60 loops; in CI, it passes on macOS but is failing on Ubuntu — working on a more robust solution.

Path(self.db_path).unlink(missing_ok=True)
except PermissionError:
# If db is still open, wait a bit and try again
sleep(0.1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep is async, so this has no effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad — we also don't need this handling because I was getting a permission error due to not properly closing the database in my test. It wasn't because the database was taking time to close.

@Darshan808 Darshan808 requested a review from davidbrochart May 6, 2025 15:50
Copy link
Collaborator

@davidbrochart davidbrochart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@krassowski
Copy link

It looks like it picked up a conflict due to refactor of files

@davidbrochart davidbrochart merged commit e8b6cd9 into y-crdt:main May 21, 2025
18 checks passed
Darshan808 added a commit to Darshan808/pycrdt-store that referenced this pull request Jul 9, 2025
…ash (y-crdt#1)

* Run VACUUM after document_ttl squash to reclaim freed pages

* test-for-size-reduction

* make-test-robust

* fix-typo

* improve-tests

* small-refactor

* run-pre-commit

* remove-sleep-and-delete
davidbrochart added a commit that referenced this pull request Jul 9, 2025
* fixtemp-db-connection

* add-test

* use-tg-in-test

* fix-test

* fix-test

* add-compression-to-yupdates

* make-backward-compatible

* add-compression-decompression-callbacks

* remove-none

* remove-staticmethod

* ignore-type

* fix-conflict-manually

* Update pyproject.toml

Co-authored-by: David Brochart <david.brochart@gmail.com>

* set--as-default

* resolve_conflict

* refactor_code

* modify-test

* Update pyproject.toml

Co-authored-by: David Brochart <david.brochart@gmail.com>

* modify-test

* fix

* Fix database file size not shrinking when `document_ttl` triggers squash (#1)

* Run VACUUM after document_ttl squash to reclaim freed pages

* test-for-size-reduction

* make-test-robust

* fix-typo

* improve-tests

* small-refactor

* run-pre-commit

* remove-sleep-and-delete

* Drop Jupyter Releaser (#14)

* Drop Jupyter Releaser

* Remove check_release

* Release v0.1.1

* retain-test

* fix-test

* fix-lint

---------

Co-authored-by: David Brochart <david.brochart@gmail.com>
Darshan808 added a commit to Darshan808/pycrdt-store that referenced this pull request Jul 21, 2025
…ash (y-crdt#1)

* Run VACUUM after document_ttl squash to reclaim freed pages

* test-for-size-reduction

* make-test-robust

* fix-typo

* improve-tests

* small-refactor

* run-pre-commit

* remove-sleep-and-delete
Darshan808 added a commit to Darshan808/pycrdt-store that referenced this pull request Jul 21, 2025
* fixtemp-db-connection

* add-test

* use-tg-in-test

* fix-test

* fix-test

* add-compression-to-yupdates

* make-backward-compatible

* add-compression-decompression-callbacks

* remove-none

* remove-staticmethod

* ignore-type

* fix-conflict-manually

* Update pyproject.toml

Co-authored-by: David Brochart <david.brochart@gmail.com>

* set--as-default

* resolve_conflict

* refactor_code

* modify-test

* Update pyproject.toml

Co-authored-by: David Brochart <david.brochart@gmail.com>

* modify-test

* fix

* Fix database file size not shrinking when `document_ttl` triggers squash (y-crdt#1)

* Run VACUUM after document_ttl squash to reclaim freed pages

* test-for-size-reduction

* make-test-robust

* fix-typo

* improve-tests

* small-refactor

* run-pre-commit

* remove-sleep-and-delete

* Drop Jupyter Releaser (y-crdt#14)

* Drop Jupyter Releaser

* Remove check_release

* Release v0.1.1

* retain-test

* fix-test

* fix-lint

---------

Co-authored-by: David Brochart <david.brochart@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting document_ttl does not lead to any decrease in the database size
3 participants