Skip to content

Conversation

@rtibbles
Copy link
Member

@rtibbles rtibbles commented Nov 6, 2025

Summary

Our assumptions about SQLite write concurrency have been that opening a transaction would wait for the write lock to be acquired, and that it would wait up until the 100s timeout that we have configured before raising a db locked error.
Unfortunately, this is not the case: https://forum.djangoproject.com/t/sqlite-and-database-is-locked-error/26994/6

Instead, because the default transaction type is DEFERRED, a write lock is not acquired when the transaction is opened, but only on the first write operation. If a write lock cannot be acquired, SQLite does not respect the 100 second timeout, and instead immediately throws a DatabaseLocked error.

Our expectations are significantly better satisfied by the BEGIN IMMEDIATE transaction, which opens a write lock immediately when the transaction is opened. So, to make Kolibri behave the way we expect, this PR creates a custom SQLite database backend that implements this behaviour.

The possible downside to this is if we wrapped every request in an atomic transaction, so every request was acquring a write lock on the database - we do not use this Django option, so I think that is not an issue. However, we should continue not to use ATOMIC_REQUESTS = True. This may also cause some performance degradation if any parts of our codebase are opening transactions for long periods of time, but as this solves a critical issue for data collection, I think this is reasonable to move forward with for now.

References

Running the locust load tests against this PR results in no failures:
image

This is in contrast to running without this fix that results in a relatively high percentage of failures, with all of them happening in the trackprogress endpoint, being indicative of potential data loss:
Screenshot From 2025-10-23 10-38-41

Reviewer guidance

Run the performance test.
Check the reported 503 during class copying is fixed by this.

…etter matches intended behaviour in our codebase.
@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: small labels Nov 6, 2025
@marcellamaki marcellamaki self-assigned this Nov 6, 2025
Copy link
Member

@marcellamaki marcellamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good. I ran the locust test and got 0% failures

Image

(whereas when I tested yesterday -- just for the baseline when adding the framework, I was over 80% at the end of the test. Yikes!)

The class copying 503 has been resolved.

@pcenov -- we are at a bit of a weird state in terms of manual QA for this, as this includes a partial fix for the class copying issues that you reported.

The expected state, in this PR, is for there to be no error page as you reported in the second video here and for the class to copy.

however, there is still a bug where the "removed" coaches get copied, despite being deleted.

here's a video to illustrate.

Screen.Recording.2025-11-05.at.9.37.47.PM.mov

manual QA to confirm that you aren't able to reproduce the error page that you were previously should be all that's needed here (just to double check my testing :D we know yours is better)

@pcenov
Copy link
Member

pcenov commented Nov 6, 2025

@marcellamaki thanks for the guidance! I confirm that the previously reported error on copying a class is fixed now.

Copy link
Member

@marcellamaki marcellamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with QA and code reviews passing, this is ready to go! 🚀

@marcellamaki marcellamaki merged commit 3938485 into learningequality:develop Nov 6, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DEV: backend Python, databases, networking, filesystem... SIZE: small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants