Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions migrations/20260122030036-cleanup-duplicate-backup-folders.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
'use strict';

const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
const MAX_ATTEMPTS = 10;
const BATCH_SIZE = 100;
const SLEEP_TIME_MS = 5000;

/** @type {import('sequelize-cli').Migration} */
module.exports = {
async up(queryInterface) {
let totalDeleted = 0;
let batchCount = 0;
let attempts = 0;

console.info(`Batch size: ${BATCH_SIZE} duplicate groups per batch`);

console.info('Starting cleanup of duplicate backup folders...');

const deleteQuery = `
WITH duplicate_groups AS (
SELECT
plain_name,
bucket,
user_id,
MIN(id) as id_to_keep
FROM folders
WHERE
created_at >= '2025-12-17 14:16:00'
AND created_at <= '2026-01-05 21:50:00'
AND parent_id IS NULL
AND parent_uuid IS NULL
AND deleted = false
AND removed = false
AND plain_name IS NOT NULL
GROUP BY plain_name, bucket, user_id
HAVING COUNT(*) > 1
LIMIT ${BATCH_SIZE}
),
Comment on lines +20 to +38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets the oldest folder from each duplicated group. So we should be getting 905 folders from this cte

folders_to_delete AS (
SELECT f.id
FROM folders f
INNER JOIN duplicate_groups dg
ON f.plain_name = dg.plain_name
AND f.bucket = dg.bucket
AND f.user_id = dg.user_id
WHERE
f.id != dg.id_to_keep
AND NOT EXISTS (
SELECT 1
FROM files
WHERE folder_id = f.id
AND status != 'DELETED'
)
AND NOT EXISTS (
SELECT 1
FROM folders child
WHERE child.parent_uuid = f.uuid
AND child.deleted = false
)
)
Comment on lines 39 to 60
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this joins all folders back to their duplicate group by matching (plain_name, bucket, user_id). So if a group has 3 duplicates, the JOIN produces 3 rows — each with that group's id_to_keep. Then we filter to only keep folders where:

  • id != id_to_keep (not the oldest/keeper)
  • No direct files (status != 'DELETED')
  • No child folders (deleted = false)

this should get 5863 folders

UPDATE folders
SET
deleted = true,
deleted_at = NOW(),
removed = true,
removed_at = NOW()
FROM folders_to_delete
WHERE folders.id = folders_to_delete.id
AND folders.deleted = false
AND folders.removed = false
RETURNING folders.id;
`;

let hasMore = true;

while (hasMore) {
try {
const [results] = await queryInterface.sequelize.query(deleteQuery);
const deletedInBatch = results.length;
batchCount++;
totalDeleted += deletedInBatch;
attempts = 0;

console.info(
`Batch ${batchCount}: Deleted ${deletedInBatch} folders (Total: ${totalDeleted})`,
);

hasMore = deletedInBatch > 0;

if (hasMore) {
await sleep(SLEEP_TIME_MS);
}
} catch (err) {
attempts++;
console.error(
`[ERROR]: Error in batch ${batchCount} (attempt ${attempts}/${MAX_ATTEMPTS}): ${err.message}`,
);

if (attempts >= MAX_ATTEMPTS) {
console.error(
'[ERROR]: Maximum retry attempts reached, exiting migration.',
);
break;
}

await sleep(SLEEP_TIME_MS);
}
}

console.info('\n=== Cleanup Complete ===');
console.info(`Total batches processed: ${batchCount}`);
console.info(`Total folders deleted: ${totalDeleted}`);
},
async down() {},
};
31 changes: 28 additions & 3 deletions src/modules/backups/backup.usecase.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,17 +80,42 @@ describe('BackupUseCase', () => {
});

describe('createDeviceAsFolder', () => {
it('When a folder with the same name exists, then it should throw a ConflictException', async () => {
it('When a folder with the same plainName exists, then it should throw a ConflictException', async () => {
const existingFolder = newFolder({
attributes: {
plainName: 'Device Folder',
bucket: userMocked.backupsBucket,
},
});
jest
.spyOn(folderUseCases, 'getFolders')
.mockResolvedValue([{ id: 1, name: 'Device Folder' }] as any);
.mockResolvedValue([existingFolder]);

await expect(
backupUseCase.createDeviceAsFolder(userMocked, 'Device Folder'),
).rejects.toThrow(ConflictException);
});

it('When no folder with the same name exists, then it should create the folder', async () => {
it('When checking for duplicates, then it should use plainName (not encrypted name) and filter by bucket', async () => {
const getFoldersSpy = jest
.spyOn(folderUseCases, 'getFolders')
.mockResolvedValue([]);
const mockFolder = newFolder();
jest
.spyOn(folderUseCases, 'createFolderDevice')
.mockResolvedValue(mockFolder);

await backupUseCase.createDeviceAsFolder(userMocked, 'My Device');

expect(getFoldersSpy).toHaveBeenCalledWith(userMocked.id, {
bucket: userMocked.backupsBucket,
plainName: 'My Device',
deleted: false,
removed: false,
});
});

it('When no folder with the same plainName exists, then it should create the folder', async () => {
const mockFolder = newFolder();
jest.spyOn(folderUseCases, 'getFolders').mockResolvedValue([]);
jest
Expand Down
Loading