Skip to content

⚡ [#677] Denormalize ObjectRecord by adding _object_type#678

Merged
stevenbal merged 7 commits intomasterfrom
issue/677-get-objects-performance
Oct 28, 2025
Merged

⚡ [#677] Denormalize ObjectRecord by adding _object_type#678
stevenbal merged 7 commits intomasterfrom
issue/677-get-objects-performance

Conversation

@stevenbal
Copy link
Collaborator

@stevenbal stevenbal commented Sep 26, 2025

Fixes #677

Reduces the amount of joins by denormalizing _object_type on ObjectRecord. Indexes have also been added to improve performance.

A data migration was added to backfill this denormalized column, which took about 1 hour to complete locally for 3.8 million ObjectRecords

Main: https://github.com/maykinmedia/objects-api/actions/runs/18089517249/job/51466666860#step:8:646
After changes: https://github.com/maykinmedia/objects-api/actions/runs/18124724855/job/51577259367?pr=678#step:8:620

Changes

  • Denormalize ObjectRecord by adding _object_type and add ind indexes to benefit from this denormalized column
  • Use non superuser token and add perftest for objecttype filter

@stevenbal stevenbal marked this pull request as draft September 26, 2025 12:16
@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2025

Codecov Report

❌ Patch coverage is 98.59155% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.96%. Comparing base (535ca32) to head (c0bcac7).

Files with missing lines Patch % Lines
src/objects/api/serializers.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #678      +/-   ##
==========================================
+ Coverage   83.16%   83.96%   +0.80%     
==========================================
  Files         128      131       +3     
  Lines        2429     2488      +59     
  Branches      193      198       +5     
==========================================
+ Hits         2020     2089      +69     
+ Misses        369      354      -15     
- Partials       40       45       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch 4 times, most recently from 75ae054 to 11069a3 Compare September 26, 2025 15:17
@stevenbal stevenbal mentioned this pull request Sep 29, 2025
14 tasks
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch 11 times, most recently from 5a456c3 to ae2ca7a Compare September 30, 2025 10:07
@stevenbal stevenbal changed the title ⚡ [#677] Add ObjectRecord index on object,index,start_at and end_at ⚡ [#677] Denormalize ObjectRecord by adding _object_type Sep 30, 2025
if not ids:
break

ObjectRecord.objects.filter(id__in=ids).update(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single batch of documents in our database takes about 18 seconds. With a total of 3 million, this operation would take about 90 minutes. That is a bit much for a database migration. Since this migration is nothing more than copying data from one table to another, would it be an idea to simply run a insert into select from?

@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 31af9a7 to 5408566 Compare October 2, 2025 08:53
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 7ca76cb to aa51878 Compare October 2, 2025 09:54
@stevenbal stevenbal requested a review from sdegroot October 2, 2025 10:06
def backfill_object_type_batch(apps, cursor, batch_size):
cursor.execute(
"""
WITH batch AS (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that'll work. Though, I see the index is not applied yet. With 3 million objects, it will have to do table scans on the _object_type_id column for every batch I think(?). Why not apply the index before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that doing a bulk update with the index would be slower, because it would have to update this index as well, but I think you're right that in that case it will do full table scans (due to the changes in filtering).

I'll check which is faster locally

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the index it seems to be a little bit faster, so I've changed it to add the index before the data migration

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenbal I do not see the index in the source code, is that the visualisation here or is it indeed still applied afterwards?

In production it takes about 5 minutes per batch. There is no index on objectrecord._object_type_id and therefore it has to do full table scans on the whole table (3.6 million records). Note that the json data in each record is about 1kb per record. Since postgres is a row database, without indexes, it will load the whole row in memory, thus having to scan about 4 GB of data for every iteration.

The migration is now running almost an hour and we have set the maximum to 90 minutes (we hoped that would be enough) :(


def forward(apps, schema_editor):
with connection.cursor() as cursor:
while True:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work with online systems, nice. Theoretically this may however result into two potential problems

  1. it may run forever if there is a constant influx of new objects
  2. it still may result in records with a null value because other pods may still write to the table even after this migration was completed (thus still requiring offline migration)

Like I Said this is more of a theoretical thing, but it will happen if you are doing online migrations on busy systems. Requires a warning at least.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see you change the field to NOT_NULL after this migration. That will effective prevent any null values, great. That operation may fail if between 0033 and 0034 an object was inserted by another pod without this change. Chances are slim but not none-existent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah that's true, if this makes it into a release I'll make sure there's a warning mentioning this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only just noticed, does this also run in a single transaction? If you run all batches in a single transaction, then it will effectively lock the whole objects database until the migration is done. That is assuming there is no other transactions having a write lock on any of these records.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 13, 2025

🐰 Bencher Report

Branchissue/677-get-objects-performance
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
milliseconds (ms)
(Result Δ%)
Upper Boundary
milliseconds (ms)
(Limit %)
performance_test/tests/test_objects_list.py::test_objects_api_list_filter_by_object_type📈 view plot
🚷 view threshold
109.84 ms
(-25.87%)Baseline: 148.18 ms
155.59 ms
(70.60%)
performance_test/tests/test_objects_list.py::test_objects_api_list_filter_one_result📈 view plot
🚷 view threshold
19.66 ms
(-6.56%)Baseline: 21.05 ms
22.10 ms
(88.99%)
performance_test/tests/test_objects_list.py::test_objects_api_list_large_page_size_page_1📈 view plot
🚷 view threshold
263.76 ms
(-9.03%)Baseline: 289.94 ms
304.44 ms
(86.64%)
performance_test/tests/test_objects_list.py::test_objects_api_list_large_page_size_page_5📈 view plot
🚷 view threshold
265.14 ms
(-8.98%)Baseline: 291.30 ms
305.86 ms
(86.69%)
performance_test/tests/test_objects_list.py::test_objects_api_list_small_page_size_page_20📈 view plot
🚷 view threshold
123.24 ms
(-18.08%)Baseline: 150.43 ms
157.95 ms
(78.02%)
🐰 View full continuous benchmarking report in Bencher

@sdegroot
Copy link

@stevenbal a new issue arose with this test-release, Theo mentioned it here #677 (comment)

It seems the UI is broken since this release (no way to change the object). Any idea if this issue is the cause?

@sdegroot
Copy link

I think we can confirm that this has greatly improved performance. Do you have an update on the duration of the upgrade? Any new improvements in that area?

@stevenbal
Copy link
Collaborator Author

@sdegroot I got the duration of the update locally from 2 hours down to about 30-40 minutes

@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch from 1098964 to cfe2fd5 Compare October 23, 2025 10:59
@stevenbal stevenbal requested review from Floris272 and danielmursa-dev and removed request for sdegroot October 23, 2025 10:59
@stevenbal stevenbal marked this pull request as ready for review October 23, 2025 10:59
and add ind indexes to benefit from this denormalized column
This took the local duration for 3.8million records (PG running with 2CPU and 4G mem) from about 2 hours to 35 minutes

* run in parallel with 4 workers and smaller batch size
* remove the index from _object_type before the data migration and add it back right after
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch from cfe2fd5 to ec551dc Compare October 27, 2025 08:51
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 3eba5b1 to c07d91b Compare October 28, 2025 09:03
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch from c07d91b to 25f2c71 Compare October 28, 2025 09:27
@stevenbal stevenbal mentioned this pull request Oct 28, 2025
13 tasks
because otherwise timeouts are raised with multiple workers
@stevenbal stevenbal force-pushed the issue/677-get-objects-performance branch from 25f2c71 to c0bcac7 Compare October 28, 2025 09:52
@stevenbal stevenbal merged commit ae2448c into master Oct 28, 2025
27 checks passed
@stevenbal stevenbal deleted the issue/677-get-objects-performance branch October 28, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GET on objects without data_attr filters bad performance

4 participants