Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

When all identical-vernacular duplicates are processed, users need feedback and a choice before the system searches for similar duplicates (which can take minutes on large projects).

Backend Changes

New endpoint FindIdenticalPotentialDuplicates

  • Synchronously returns words with identical vernacular (fast operation)
  • Contrast with existing FindPotentialDuplicates which uses async/SignalR for similar matches

Refactored MergeService.GetPotentialDuplicates

  • Made public with new identicalVernacular parameter
  • trueGetIdenticalVernWords, falseGetSimilarWords

Frontend Changes

Modified MergeDups flow

  1. Start goal → find identical duplicates synchronously
  2. Process duplicates normally
  3. When list empty → show IdenticalDuplicatesDialog
  4. User continues → trigger async similar duplicate search (original behavior)

New IdenticalDuplicatesDialog component

  • Congratulates completion of identical duplicates
  • Shows deferred count with "Review Deferred" button if applicable
  • Warns if >1000 frontier entries (slow processing expected)
  • Actions: Cancel, Review Deferred, Continue

Added DataLoadStatus.IdenticalCompleted

  • Tracks transition state between identical and similar duplicate finding

Example Flow

// Before: Single async call for all duplicates
asyncAddGoal(new MergeDups())
   findDuplicates() // slow, finds everything via SignalR
  
// After: Two-phase with dialog
asyncAddGoal(new MergeDups())
   findIdenticalDuplicates() // fast, synchronous
   [user processes duplicates]
   IdenticalDuplicatesDialog appears when list empty
   asyncLoadSimilarDuplicates() // slow, finds similar via SignalR

The change provides clear feedback at a natural breakpoint while maintaining backward compatibility with the async similar-duplicate search.

Original prompt

This section details on the original issue you should resolve

<issue_title>[MergeDups] Give indication when all identical-vernacular duplicates have been dealt with</issue_title>
<issue_description>When all sets of words with identical vernacular have been dealt with (or deferred) is a good time to:

  • Congratulate the user for getting through them all
  • Warn that finding potential duplicates with similar vernacular forms takes a long time (e.g., 2.5 minutes for an 18,000 entry project)
  • If there are deferred duplicates, say so and offer button to load those instead

Todo:

  • Backend/Services/MergeService.cs:
    • Add to GetPotentialDuplicates a new parameter bool identicalVernacular, right after int maxLists,. If true, use dupFinder.GetIdenticalVernWords; if false, use dupFinder.GetSimilarWords(collection, isUnavailableSet, ignoreProtected).
    • Update GetAndStorePotentialDuplicates to call GetPotentialDuplicates with identicalVernacular = false
  • Backend/Controllers/MergeController.cs:
    • Add method for new endpoint [HttpGet("findidenticaldups/{maxInList:int}/{maxLists:int}/{ignoreProtected:bool}", Name = "FindIdenticalPotentialDuplicates")]. Unlike the existing FindPotentialDuplicates, finding those with identical vernacular is fast, so instead of using a get-then-signal function, just call on _mergeService.GetPotentialDuplicates directly with identicalVernacular = true
  • Update interfaces, mocks, and tests
  • Run npm run fmt-backend
  • Run npm run backend, initiate the Python virtual environment (venv) specified in the README, then run python scripts/generate_openapi.py
  • src/backend/index.ts:
    • Add new function findIdenticalDuplicates (before findDuplicates) that uses mergeApi.findIdenticalPotentialDuplicates
  • src/goals/Redux/GoalActions.ts:
    • asyncAddGoal no longer needs a special GoalType.MergeDups
    • loadGoalData in the GoalType.MergeDups case should use the new findIdenticalDuplicates function
  • src/goals/MergeDuplicates/*
    • If loadGoalData loaded an empty list, then instead of the "nothing to merge" screen, an action should be dispatched to do the stuff dropped from asyncAddGoal
    • While loading (i.e. waiting for the signal), there should be a dialog pop up over the loading screen:
      • Congratulate user for having processed all sets of words with identical vernacular
      • If there are any deferred dups, let user know how many there are, and include a button at the bottom of the dialog to review deferred duplicate
      • Explain that The Combine is now finding potential duplicates with similar (non-identical) vernacular form.
      • If the project has more than 1000 frontier entries, warn that this can take several minutes
      • Have button at bottom to cancel (returns to Data Entry)
      • Have button at bottom to continue loading
      • Automatically close the dialog if the signal comes that the duplicate finding is done</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.


This change is Reviewable

Copilot AI and others added 5 commits November 21, 2025 19:58
…egration

Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>
Copilot AI changed the title [WIP] Add indication for resolving identical-vernacular duplicates [MergeDups] Separate identical vs similar duplicate finding with intermediate dialog Nov 21, 2025
Copilot AI requested a review from imnasnainaec November 21, 2025 20:16
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

⚠️ Commit Message Format Issues ⚠️
commit 165c2c3464:
1: T1 Title exceeds max length (77>72): "Backend: Add findIdenticalDuplicates endpoint and update frontend integration"

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 41.26984% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.47%. Comparing base (a49f082) to head (60f9a89).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...oals/MergeDuplicates/IdenticalDuplicatesDialog.tsx 4.76% 20 Missing ⚠️
src/goals/DefaultGoal/BaseGoalScreen.tsx 0.00% 9 Missing ⚠️
src/goals/Redux/GoalActions.ts 73.33% 3 Missing and 1 partial ⚠️
Backend/Services/MergeService.cs 50.00% 1 Missing and 1 partial ⚠️
src/goals/MergeDuplicates/MergeHub.tsx 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4039      +/-   ##
==========================================
- Coverage   74.66%   74.47%   -0.19%     
==========================================
  Files         293      295       +2     
  Lines       10890    10939      +49     
  Branches     1364     1369       +5     
==========================================
+ Hits         8131     8147      +16     
- Misses       2362     2396      +34     
+ Partials      397      396       -1     
Flag Coverage Δ
backend 85.95% <87.50%> (+0.29%) ⬆️
frontend 65.59% <25.53%> (-0.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MergeDups] Give indication when all identical-vernacular duplicates have been dealt with

2 participants