:zap: [#677] Denormalize ObjectRecord by adding _object_type by stevenbal · Pull Request #678 · maykinmedia/objects-api

stevenbal · 2025-09-26T12:16:13Z

Fixes #677

Reduces the amount of joins by denormalizing _object_type on ObjectRecord. Indexes have also been added to improve performance.

A data migration was added to backfill this denormalized column, which took about 1 hour to complete locally for 3.8 million ObjectRecords

Main: https://github.com/maykinmedia/objects-api/actions/runs/18089517249/job/51466666860#step:8:646
After changes: https://github.com/maykinmedia/objects-api/actions/runs/18124724855/job/51577259367?pr=678#step:8:620

Changes

Denormalize ObjectRecord by adding _object_type and add ind indexes to benefit from this denormalized column
Use non superuser token and add perftest for objecttype filter

codecov-commenter · 2025-09-26T12:18:16Z

Codecov Report

❌ Patch coverage is 98.59155% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.96%. Comparing base (535ca32) to head (c0bcac7).

Files with missing lines	Patch %	Lines
src/objects/api/serializers.py	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #678      +/-   ##
==========================================
+ Coverage   83.16%   83.96%   +0.80%     
==========================================
  Files         128      131       +3     
  Lines        2429     2488      +59     
  Branches      193      198       +5     
==========================================
+ Hits         2020     2089      +69     
+ Misses        369      354      -15     
- Partials       40       45       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sdegroot · 2025-10-01T06:44:29Z

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py

+        if not ids:
+            break
+
+        ObjectRecord.objects.filter(id__in=ids).update(


A single batch of documents in our database takes about 18 seconds. With a total of 3 million, this operation would take about 90 minutes. That is a bit much for a database migration. Since this migration is nothing more than copying data from one table to another, would it be an idea to simply run a insert into select from?

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py

src/objects/core/migrations/0034_alter_objectrecord__object_type_and_more.py

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py

sdegroot · 2025-10-02T10:12:01Z

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py

+def backfill_object_type_batch(apps, cursor, batch_size):
+    cursor.execute(
+        """
+        WITH batch AS (


Nice, that'll work. Though, I see the index is not applied yet. With 3 million objects, it will have to do table scans on the _object_type_id column for every batch I think(?). Why not apply the index before?

I figured that doing a bulk update with the index would be slower, because it would have to update this index as well, but I think you're right that in that case it will do full table scans (due to the changes in filtering).

I'll check which is faster locally

With the index it seems to be a little bit faster, so I've changed it to add the index before the data migration

@stevenbal I do not see the index in the source code, is that the visualisation here or is it indeed still applied afterwards?

In production it takes about 5 minutes per batch. There is no index on objectrecord._object_type_id and therefore it has to do full table scans on the whole table (3.6 million records). Note that the json data in each record is about 1kb per record. Since postgres is a row database, without indexes, it will load the whole row in memory, thus having to scan about 4 GB of data for every iteration.

The migration is now running almost an hour and we have set the maximum to 90 minutes (we hoped that would be enough) :(

sdegroot · 2025-10-02T10:14:55Z

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py

+
+def forward(apps, schema_editor):
+    with connection.cursor() as cursor:
+        while True:


This will work with online systems, nice. Theoretically this may however result into two potential problems

it may run forever if there is a constant influx of new objects

it still may result in records with a null value because other pods may still write to the table even after this migration was completed (thus still requiring offline migration)

Like I Said this is more of a theoretical thing, but it will happen if you are doing online migrations on busy systems. Requires a warning at least.

Actually, I see you change the field to NOT_NULL after this migration. That will effective prevent any null values, great. That operation may fail if between 0033 and 0034 an object was inserted by another pod without this change. Chances are slim but not none-existent.

Hmm yeah that's true, if this makes it into a release I'll make sure there's a warning mentioning this

I only just noticed, does this also run in a single transaction? If you run all batches in a single transaction, then it will effectively lock the whole objects database until the migration is done. That is assuming there is no other transactions having a write lock on any of these records.

github-actions · 2025-10-13T13:26:31Z

Bencher Report

Branch	issue/677-get-objects-performance
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	Benchmark Result milliseconds (ms) (Result Δ%)	Upper Boundary milliseconds (ms) (Limit %)
performance_test/tests/test_objects_list.py::test_objects_api_list_filter_by_object_type	📈 view plot 🚷 view threshold	109.84 ms (-25.87%) Baseline: 148.18 ms	155.59 ms (70.60%)
performance_test/tests/test_objects_list.py::test_objects_api_list_filter_one_result	📈 view plot 🚷 view threshold	19.66 ms (-6.56%) Baseline: 21.05 ms	22.10 ms (88.99%)
performance_test/tests/test_objects_list.py::test_objects_api_list_large_page_size_page_1	📈 view plot 🚷 view threshold	263.76 ms (-9.03%) Baseline: 289.94 ms	304.44 ms (86.64%)
performance_test/tests/test_objects_list.py::test_objects_api_list_large_page_size_page_5	📈 view plot 🚷 view threshold	265.14 ms (-8.98%) Baseline: 291.30 ms	305.86 ms (86.69%)
performance_test/tests/test_objects_list.py::test_objects_api_list_small_page_size_page_20	📈 view plot 🚷 view threshold	123.24 ms (-18.08%) Baseline: 150.43 ms	157.95 ms (78.02%)

🐰 View full continuous benchmarking report in Bencher

sdegroot · 2025-10-20T08:30:02Z

@stevenbal a new issue arose with this test-release, Theo mentioned it here #677 (comment)

It seems the UI is broken since this release (no way to change the object). Any idea if this issue is the cause?

sdegroot · 2025-10-20T20:20:18Z

I think we can confirm that this has greatly improved performance. Do you have an update on the duration of the upgrade? Any new improvements in that area?

stevenbal · 2025-10-21T08:05:10Z

@sdegroot I got the duration of the update locally from 2 hours down to about 30-40 minutes

src/objects/core/models.py

and add ind indexes to benefit from this denormalized column

This took the local duration for 3.8million records (PG running with 2CPU and 4G mem) from about 2 hours to 35 minutes * run in parallel with 4 workers and smaller batch size * remove the index from _object_type before the data migration and add it back right after

because otherwise timeouts are raised with multiple workers

stevenbal marked this pull request as draft September 26, 2025 12:16

stevenbal force-pushed the issue/677-get-objects-performance branch 4 times, most recently from 75ae054 to 11069a3 Compare September 26, 2025 15:17

stevenbal mentioned this pull request Sep 29, 2025

Prepare Objects API release 3.3.0 #679

Closed

14 tasks

stevenbal force-pushed the issue/677-get-objects-performance branch 11 times, most recently from 5a456c3 to ae2ca7a Compare September 30, 2025 10:07

stevenbal changed the title ~~⚡ [#677] Add ObjectRecord index on object,index,start_at and end_at~~ ⚡ [#677] Denormalize ObjectRecord by adding _object_type Sep 30, 2025

sdegroot suggested changes Oct 1, 2025

View reviewed changes

stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 31af9a7 to 5408566 Compare October 2, 2025 08:53

sdegroot reviewed Oct 2, 2025

View reviewed changes

src/objects/core/migrations/0033_objectrecord__backfill_denormalized_fields.py Outdated Show resolved Hide resolved

src/objects/core/migrations/0034_alter_objectrecord__object_type_and_more.py Outdated Show resolved Hide resolved

stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 7ca76cb to aa51878 Compare October 2, 2025 09:54

stevenbal requested a review from sdegroot October 2, 2025 10:06

sdegroot reviewed Oct 2, 2025

View reviewed changes

stevenbal mentioned this pull request Oct 2, 2025

🔖 [#679] Release version 3.3.0 #681

Merged

sdegroot mentioned this pull request Oct 20, 2025

GET on objects without data_attr filters bad performance #677

Closed

stevenbal force-pushed the issue/677-get-objects-performance branch from 1098964 to cfe2fd5 Compare October 23, 2025 10:59

stevenbal requested review from Floris272 and danielmursa-dev and removed request for sdegroot October 23, 2025 10:59

stevenbal marked this pull request as ready for review October 23, 2025 10:59

danielmursa-dev reviewed Oct 23, 2025

View reviewed changes

src/objects/core/models.py Show resolved Hide resolved

stevenbal requested a review from danielmursa-dev October 24, 2025 06:36

stevenbal mentioned this pull request Oct 24, 2025

✅ Use non superuser token and add perftest for objecttype filter #690

Merged

danielmursa-dev approved these changes Oct 24, 2025

View reviewed changes

stevenbal added 3 commits October 27, 2025 09:50

⚡ [#677] Denormalize ObjectRecord by adding _object_type

0e6d084

and add ind indexes to benefit from this denormalized column

✅ [#677] Add tests for denormalized ObjectRecord._object_type

555b9b1

stevenbal force-pushed the issue/677-get-objects-performance branch from cfe2fd5 to ec551dc Compare October 27, 2025 08:51

➕ [#677] Add tqdm to show progress indicators for scripts

4e3bcc4

stevenbal force-pushed the issue/677-get-objects-performance branch 3 times, most recently from 3eba5b1 to c07d91b Compare October 28, 2025 09:03

stevenbal added 2 commits October 28, 2025 10:13

🔊 [#677] Show _object_type migration backfill progress with tqdm

3676590

🐛 [#677] Ensure new objectrecords can be added via admin

499635d

stevenbal force-pushed the issue/677-get-objects-performance branch from c07d91b to 25f2c71 Compare October 28, 2025 09:27

stevenbal mentioned this pull request Oct 28, 2025

Prepare release 3.4.0 #691

Closed

13 tasks

🐳 [#677] Set num data migration workers to 1 if pooling is enabled

c0bcac7

because otherwise timeouts are raised with multiple workers

stevenbal force-pushed the issue/677-get-objects-performance branch from 25f2c71 to c0bcac7 Compare October 28, 2025 09:52

stevenbal merged commit ae2448c into master Oct 28, 2025
27 checks passed

stevenbal deleted the issue/677-get-objects-performance branch October 28, 2025 10:09

Conversation

stevenbal commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdegroot commented Oct 20, 2025

Uh oh!

sdegroot commented Oct 20, 2025

Uh oh!

stevenbal commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevenbal commented Sep 26, 2025 •

edited

Loading

codecov-commenter commented Sep 26, 2025 •

edited

Loading

github-actions bot commented Oct 13, 2025 •

edited

Loading