Skip to content

Conversation

@jzunigax2
Copy link
Contributor

@jzunigax2 jzunigax2 commented Feb 5, 2026

What

Adds a batched migration to rename duplicate folders that share the same (parent_uuid, plain_name) pair where deleted = false

Why

The unique index on folders(parent_uuid, plain_name) WHERE deleted = false (blocked PR: #882) cannot be applied because existing duplicate entries violate the constraint:

  ERROR: could not create unique index "folders_parentuuid_plainname_unique"
  DETAIL: Key (parent_uuid, plain_name)=(<uuid>-..., images) is duplicated.

This migration resolves those conflicts by renaming duplicates, making the data safe for the unique index.

How

  1. Uses GROUP BY parent_uuid, plain_name HAVING COUNT(*) > 1 to find only duplicate groups
  2. Keeps the folder with MIN(id) (oldest) unchanged
  3. Renames duplicates via plain_name || '_' || id::text
  4. Processes in batches of 500 groups with 1s sleep between batches
  5. Retries up to 10 times on transient errors

EXPLAIN ANALYZE

QUERY PLAN                                                                                                                               |
-----------------------------------------------------------------------------------------------------------------------------------------+
Update on folders f  (cost=47.33..89.12 rows=2 width=583) (actual time=2.925..87.395 rows=400 loops=1)                                   |
  ->  Hash Join  (cost=47.33..89.12 rows=2 width=583) (actual time=1.803..3.895 rows=400 loops=1)                                        |
        Hash Cond: ((f.parent_uuid = dg.parent_uuid) AND ((f.plain_name)::text = (dg.plain_name)::text))                                 |
        Join Filter: (f.id <> dg.id_to_keep)                                                                                             |
        Rows Removed by Join Filter: 200                                                                                                 |
        ->  Seq Scan on folders f  (cost=0.00..38.06 rows=706 width=35) (actual time=0.064..0.766 rows=706 loops=1)                      |
              Filter: (NOT deleted)                                                                                                      |
        ->  Hash  (cost=46.79..46.79 rows=36 width=82) (actual time=1.672..1.676 rows=200 loops=1)                                       |
              Buckets: 1024  Batches: 1  Memory Usage: 30kB                                                                              |
              ->  Subquery Scan on dg  (cost=45.09..46.79 rows=36 width=82) (actual time=1.412..1.559 rows=200 loops=1)                  |
                    ->  Limit  (cost=45.09..46.43 rows=36 width=29) (actual time=0.933..1.032 rows=200 loops=1)                          |
                          ->  HashAggregate  (cost=45.09..46.43 rows=36 width=29) (actual time=0.929..1.016 rows=200 loops=1)            |
                                Group Key: folders.parent_uuid, folders.plain_name                                                       |
                                Filter: (count(*) > 1)                                                                                   |
                                Batches: 1  Memory Usage: 109kB                                                                          |
                                Rows Removed by Filter: 104                                                                              |
                                ->  Seq Scan on folders  (cost=0.00..38.06 rows=703 width=29) (actual time=0.004..0.243 rows=704 loops=1)|
                                      Filter: ((NOT deleted) AND (parent_uuid IS NOT NULL) AND (plain_name IS NOT NULL))                 |
                                      Rows Removed by Filter: 2                                                                          |
Planning Time: 3.143 ms                                                                                                                  |
Trigger mark_deleted_files_on_delete_v3: time=3.040 calls=400                                                                            |
Trigger update_look_up_table_after_folder_updated: time=43.770 calls=400                                                                 |
Execution Time: 132.862 ms                                                                                                               |

WITH duplicate_groups AS (
SELECT parent_uuid, plain_name, MIN(id) as id_to_keep
FROM folders
WHERE deleted = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add removed = false from here also, as the indexes we should add and the issue we have, is with the existing folders in the same folder with the same name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

WHERE f.parent_uuid = dg.parent_uuid
AND f.plain_name = dg.plain_name
AND f.id != dg.id_to_keep
AND f.deleted = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jzunigax2 jzunigax2 force-pushed the chore/duplicate-folder-rename-script branch from adcaf58 to 0d26e88 Compare February 5, 2026 12:51
@jzunigax2 jzunigax2 requested a review from sg-gs February 5, 2026 12:52
sg-gs
sg-gs previously approved these changes Feb 5, 2026
Copy link
Member

@sg-gs sg-gs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Analysed the count and there are 185 cases where this is happening. Running migration @jzunigax2

@jzunigax2
Copy link
Contributor Author

@sg-gs added a temporary support index which resulted on the following explain result locally

QUERY PLAN                                                                                                                                                                           |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Update on folders f  (cost=2.85..9055.30 rows=1 width=585) (actual time=22.911..1662.107 rows=2006 loops=1)                                                                          |
  ->  Nested Loop  (cost=2.85..9055.30 rows=1 width=585) (actual time=9.611..448.157 rows=2006 loops=1)                                                                              |
        ->  Subquery Scan on dg  (cost=2.42..1376.29 rows=1000 width=86) (actual time=9.412..412.785 rows=1000 loops=1)                                                              |
              ->  Limit  (cost=2.42..1366.29 rows=1000 width=31) (actual time=9.141..407.938 rows=1000 loops=1)                                                                      |
                    ->  GroupAggregate  (cost=2.42..141118.62 rows=103468 width=31) (actual time=9.136..405.275 rows=1000 loops=1)                                                   |
                          Group Key: folders.parent_uuid, folders.plain_name                                                                                                         |
                          Filter: (count(*) > 1)                                                                                                                                     |
                          ->  Incremental Sort  (cost=2.42..132932.35 rows=430621 width=31) (actual time=9.103..392.934 rows=3007 loops=1)                                           |
                                Sort Key: folders.parent_uuid, folders.plain_name                                                                                                    |
                                Presorted Key: folders.parent_uuid                                                                                                                   |
                                Full-sort Groups: 88  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB                                                                |
                                ->  Index Scan using folders_parent_uuid_index on folders  (cost=0.42..120321.73 rows=430621 width=31) (actual time=0.980..380.954 rows=3039 loops=1)|
                                      Index Cond: (parent_uuid IS NOT NULL)                                                                                                          |
                                      Filter: ((NOT deleted) AND (plain_name IS NOT NULL))                                                                                           |
        ->  Index Scan using folders_parentuuid_plainname_not_deleted_support_index on folders f  (cost=0.42..7.67 rows=1 width=37) (actual time=0.018..0.023 rows=2 loops=1000)     |
              Index Cond: ((parent_uuid = dg.parent_uuid) AND ((plain_name)::text = (dg.plain_name)::text))                                                                          |
              Filter: ((NOT deleted) AND (NOT removed) AND (id <> dg.id_to_keep))                                                                                                    |
              Rows Removed by Filter: 1                                                                                                                                              |
Planning Time: 34.783 ms                                                                                                                                                             |
Trigger update_look_up_table_after_folder_updated: time=781.462 calls=2006                                                                                                           |
Execution Time: 2449.393 ms                                                                                                                                                          |

it gets rid of the sequential scan and instead leverages the supporting index

sg-gs
sg-gs previously approved these changes Feb 5, 2026
Copy link
Member

@sg-gs sg-gs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let's see how it performs. Executing migration @jzunigax2

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants