Skip to content

Commit ec48cd9

Browse files
committed
perf(models): change extra_data to JSONB
* `JSONB` is a newer and more efficient format for storing JSON in PostgreSQL. Compared to `JSON`, it offers significantly faster querying and indexing support (but is very slightly slower to input). The JSON is decomposed into a binary format rather than being stored in plaintext. * From a high-level perspective, the access of the data is unchanged in Python so this is not a breaking change. However, the new column type supports more querying operations which could be used in the future. * Migrating the column type is quite easy for small tables but can cause performance and stability issues for larger ones. Therefore, a default Alembic migration has been included for the vast majority of use cases, as well as a step-by-step alternative guide for larger instances. Instances with more than ~50k rows in `oauthclient_remoteaccount` are advised not to use the Alembic migration and to instead follow the upgrade guide.
1 parent 2debc7b commit ec48cd9

File tree

5 files changed

+120
-1
lines changed

5 files changed

+120
-1
lines changed

CHANGES.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
Changes
1010
=======
1111

12+
Version v5.3.0 (released 2025-XX-XX)
13+
14+
- perf(models): changed SQL column type of `extra_data` to `JSONB` instead of `JSON` for PostgreSQL
15+
16+
- This migration could flood the WAL when run on very large `oauthclient_remoteaccount` tables (~50k+ rows) as well as causing a full lock that lasts several minutes and stops users from being able to log in with third-party accounts. This won't happen in the majority of cases. For instances not meeting this size criteria, the automated Alembic migration will suffice. However, for tables above this size, please use instead the step-by-step instructions in the upgrade guide (included in the documentation of this module).
17+
1218
Version v5.2.0 (released 2025-07-17)
1319

1420
- i18n: pulled translations

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Invenio-OAuthClient.
2222
configuration
2323
usage
2424
examplesapp
25+
upgrading
2526

2627
API Reference
2728
=============

docs/upgrading.rst

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
Upgrading
2+
=========
3+
4+
This page contains specific information on upgrading between certain versions of ``invenio-oauthclient`` where special steps are involved.
5+
6+
======
7+
v5.3.0
8+
======
9+
10+
This version involves an Alembic migration (``1758275763``) moving the ``extra_data`` column from the ``JSON`` type to the ``JSONB`` type (**only for PostgreSQL**).
11+
By doing this, we can improve performance and use more advanced JSON queries.
12+
13+
For the majority of users, this shouldn't be an issue as the migration will be handled automatically when upgrading.
14+
However, users with an ``oauthclient_remoteaccount`` table with ~50k+ rows should use an alternative approach, as this operation could overfill the WAL and affect the stability of the database, as well as creating a full lock for several minutes.
15+
16+
Instead, use these steps:
17+
18+
1. Install the new version of `invenio-oauthclient`, but do not run the Alembic migrations.
19+
20+
2. In an SQL console, run:
21+
22+
.. code-block:: sql
23+
24+
ALTER TABLE oauthclient_remoteaccount ADD COLUMN extra_data_b jsonb;
25+
26+
3. Next, run this query repeatedly until the response indicates that no new rows are being updated.
27+
You can control the batch size depending on your requirements using the ``LIMIT`` value.
28+
29+
.. code-block:: sql
30+
31+
WITH cte AS (
32+
SELECT id
33+
FROM oauthclient_remoteaccount
34+
WHERE extra_data_b IS NULL
35+
ORDER BY id
36+
LIMIT 1000
37+
FOR UPDATE SKIP LOCKED
38+
)
39+
UPDATE oauthclient_remoteaccount r
40+
SET extra_data_b = r.extra_data::jsonb
41+
FROM cte
42+
WHERE r.id = cte.id
43+
RETURNING r.id;
44+
45+
4. Double check there are no rows left to migrate.
46+
47+
.. code-block:: sql
48+
49+
SELECT COUNT(*) FROM oauthclient_remoteaccount WHERE extra_data_b IS NULL;
50+
51+
This should return ``0``. If it does not, continue repeating step 2.
52+
53+
5. Drop and rename the columns in a simultaneous operation (this requires a brief lock but is much faster than the normal migration).
54+
55+
.. code-block:: sql
56+
57+
ALTER TABLE oauthclient_remoteaccount DROP COLUMN extra_data;
58+
ALTER TABLE oauthclient_remoteaccount RENAME COLUMN extra_data_b TO extra_data;
59+
60+
5. Finally, mark the relevant migration as having been manually performed.
61+
62+
.. code-block:: bash
63+
64+
invenio alembic stamp 1758275763
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#
2+
# This file is part of Invenio.
3+
# Copyright (C) 2016-2018 CERN.
4+
#
5+
# Invenio is free software; you can redistribute it and/or modify it
6+
# under the terms of the MIT License; see LICENSE file for more details.
7+
8+
"""Extra Data JSONB."""
9+
10+
import sqlalchemy as sa
11+
from alembic import op
12+
13+
# revision identifiers, used by Alembic.
14+
revision = "1758275763"
15+
down_revision = "aaa265b0afa6"
16+
branch_labels = ()
17+
depends_on = None
18+
19+
20+
def upgrade():
21+
"""Upgrade database."""
22+
if op.get_context().dialect.name == "postgresql":
23+
op.alter_column(
24+
"oauthclient_remoteaccount",
25+
"extra_data",
26+
type_=sa.dialects.postgresql.JSONB,
27+
postgresql_using="extra_data::text::jsonb",
28+
)
29+
30+
31+
def downgrade():
32+
"""Downgrade database."""
33+
if op.get_context().dialect.name == "postgresql":
34+
op.alter_column(
35+
"oauthclient_remoteaccount",
36+
"extra_data",
37+
type_=sa.dialects.postgresql.JSON,
38+
postgresql_using="extra_data::text::json",
39+
)

invenio_oauthclient/models.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
# here, means previous imports won't break.
1717
from invenio_accounts.models import User, UserIdentity
1818
from invenio_db import db
19+
from sqlalchemy.dialects import postgresql
1920
from sqlalchemy.ext.mutable import MutableDict
2021
from sqlalchemy.orm import backref
2122
from sqlalchemy_utils import JSONType, StringEncryptedType, Timestamp
@@ -45,7 +46,15 @@ class RemoteAccount(db.Model, Timestamp):
4546
client_id = db.Column(db.String(255), nullable=False)
4647
"""Client ID of remote application (defined in OAUTHCLIENT_REMOTE_APPS)."""
4748

48-
extra_data = db.Column(MutableDict.as_mutable(JSONType), nullable=False)
49+
extra_data = db.Column(
50+
MutableDict.as_mutable(
51+
db.JSON()
52+
.with_variant(postgresql.JSONB(), "postgresql")
53+
.with_variant(JSONType(), "sqlite")
54+
.with_variant(JSONType(), "mysql")
55+
),
56+
nullable=False,
57+
)
4958
"""Extra data associated with this linked account."""
5059

5160
#

0 commit comments

Comments
 (0)