Skip to content

Conversation

@DaisukeYoda
Copy link
Member

@DaisukeYoda DaisukeYoda commented Feb 7, 2026

Summary

  • Lower thresholds: Type1: 0.98→0.85, Type2: 0.95→0.75, Type3: 0.85→0.70, Type4: 0.80→0.65 for broader detection
  • Re-include Type-3 in default enabled types: The multi-metric classifier assigns ~93% of detected pairs as Type-3 (APTED structural). Excluding it caused filterCloneGroups to discard all groups, making Duplication score always 100/100
  • Switch grouping from K-Core to Connected: K-Core (k=2) cannot group 1:1 pairs
  • Align SimilarityThreshold (0.65) and GroupingThreshold with Type-4 threshold

Before (all repos scored Duplication 100/100)

Repo Score Duplication Groups
A 100 (A) 100/100 0
B 100 (A) 100/100 0

After (scores now differentiate properly)

Repo Score Duplication Groups
A 86 (B) 30/100 2
B 95 (A) 75/100 1
C 93 (A) 65/100 1
D 96 (A) 80/100 1
E 97 (A) 85/100 3
F 92 (A) 60/100 18

Test plan

  • make test all pass
  • make fmt / make lint clean
  • Verified on 6 well-known Python libraries
  • False positive check: pairs below sim 0.65 are now excluded; top pairs are legitimate

Lower clone type thresholds (Type1: 0.98→0.85, Type2: 0.95→0.75,
Type3: 0.85→0.70, Type4: 0.80→0.65) for broader detection.

Re-include Type-3 in default enabled clone types: the multi-metric
classifier assigns Type-3 (APTED structural comparison) to ~93% of
detected clones. Excluding Type-3 caused filterCloneGroups to discard
all groups, resulting in Duplication score always being 100/100.

Switch default grouping from K-Core to Connected so that all clone
pairs form groups (K-Core requires k>=2 neighbors, missing 1:1 pairs).

Align SimilarityThreshold (0.65) and GroupingThreshold with Type-4
threshold to ensure all detected clones appear in reports and groups.
@DaisukeYoda DaisukeYoda self-assigned this Feb 7, 2026
@DaisukeYoda DaisukeYoda merged commit 05601e5 into main Feb 7, 2026
4 checks passed
@DaisukeYoda DaisukeYoda deleted the fix/clone-scoring-calibration branch February 7, 2026 05:26
@DaisukeYoda DaisukeYoda mentioned this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant