⚡ [#677] Denormalize ObjectRecord by adding _object_type#678
⚡ [#677] Denormalize ObjectRecord by adding _object_type#678
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #678 +/- ##
==========================================
+ Coverage 83.16% 83.96% +0.80%
==========================================
Files 128 131 +3
Lines 2429 2488 +59
Branches 193 198 +5
==========================================
+ Hits 2020 2089 +69
+ Misses 369 354 -15
- Partials 40 45 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
75ae054 to
11069a3
Compare
5a456c3 to
ae2ca7a
Compare
| if not ids: | ||
| break | ||
|
|
||
| ObjectRecord.objects.filter(id__in=ids).update( |
There was a problem hiding this comment.
A single batch of documents in our database takes about 18 seconds. With a total of 3 million, this operation would take about 90 minutes. That is a bit much for a database migration. Since this migration is nothing more than copying data from one table to another, would it be an idea to simply run a insert into select from?
31af9a7 to
5408566
Compare
src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py
Outdated
Show resolved
Hide resolved
src/objects/core/migrations/0034_alter_objectrecord__object_type_and_more.py
Outdated
Show resolved
Hide resolved
7ca76cb to
aa51878
Compare
src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py
Outdated
Show resolved
Hide resolved
| def backfill_object_type_batch(apps, cursor, batch_size): | ||
| cursor.execute( | ||
| """ | ||
| WITH batch AS ( |
There was a problem hiding this comment.
Nice, that'll work. Though, I see the index is not applied yet. With 3 million objects, it will have to do table scans on the _object_type_id column for every batch I think(?). Why not apply the index before?
There was a problem hiding this comment.
I figured that doing a bulk update with the index would be slower, because it would have to update this index as well, but I think you're right that in that case it will do full table scans (due to the changes in filtering).
I'll check which is faster locally
There was a problem hiding this comment.
With the index it seems to be a little bit faster, so I've changed it to add the index before the data migration
There was a problem hiding this comment.
@stevenbal I do not see the index in the source code, is that the visualisation here or is it indeed still applied afterwards?
In production it takes about 5 minutes per batch. There is no index on objectrecord._object_type_id and therefore it has to do full table scans on the whole table (3.6 million records). Note that the json data in each record is about 1kb per record. Since postgres is a row database, without indexes, it will load the whole row in memory, thus having to scan about 4 GB of data for every iteration.
The migration is now running almost an hour and we have set the maximum to 90 minutes (we hoped that would be enough) :(
|
|
||
| def forward(apps, schema_editor): | ||
| with connection.cursor() as cursor: | ||
| while True: |
There was a problem hiding this comment.
This will work with online systems, nice. Theoretically this may however result into two potential problems
- it may run forever if there is a constant influx of new objects
- it still may result in records with a null value because other pods may still write to the table even after this migration was completed (thus still requiring offline migration)
Like I Said this is more of a theoretical thing, but it will happen if you are doing online migrations on busy systems. Requires a warning at least.
There was a problem hiding this comment.
Actually, I see you change the field to NOT_NULL after this migration. That will effective prevent any null values, great. That operation may fail if between 0033 and 0034 an object was inserted by another pod without this change. Chances are slim but not none-existent.
There was a problem hiding this comment.
Hmm yeah that's true, if this makes it into a release I'll make sure there's a warning mentioning this
There was a problem hiding this comment.
I only just noticed, does this also run in a single transaction? If you run all batches in a single transaction, then it will effectively lock the whole objects database until the migration is done. That is assuming there is no other transactions having a write lock on any of these records.
|
@stevenbal a new issue arose with this test-release, Theo mentioned it here #677 (comment) It seems the UI is broken since this release (no way to change the object). Any idea if this issue is the cause? |
|
I think we can confirm that this has greatly improved performance. Do you have an update on the duration of the upgrade? Any new improvements in that area? |
|
@sdegroot I got the duration of the update locally from 2 hours down to about 30-40 minutes |
1098964 to
cfe2fd5
Compare
and add ind indexes to benefit from this denormalized column
This took the local duration for 3.8million records (PG running with 2CPU and 4G mem) from about 2 hours to 35 minutes * run in parallel with 4 workers and smaller batch size * remove the index from _object_type before the data migration and add it back right after
cfe2fd5 to
ec551dc
Compare
3eba5b1 to
c07d91b
Compare
c07d91b to
25f2c71
Compare
because otherwise timeouts are raised with multiple workers
25f2c71 to
c0bcac7
Compare
Fixes #677
Reduces the amount of joins by denormalizing
_object_typeonObjectRecord. Indexes have also been added to improve performance.A data migration was added to backfill this denormalized column, which took about 1 hour to complete locally for 3.8 million ObjectRecords
Main: https://github.com/maykinmedia/objects-api/actions/runs/18089517249/job/51466666860#step:8:646
After changes: https://github.com/maykinmedia/objects-api/actions/runs/18124724855/job/51577259367?pr=678#step:8:620
Changes