SemDedup outputs docs_to_remove + All dedup modules extend BaseDeduplicationModule#617
Conversation
sarahyurick
left a comment
There was a problem hiding this comment.
Nothing was changed from #615 right? Assuming yes and that GPU CI passes, then LGTM.
VibhuJawa
left a comment
There was a problem hiding this comment.
Thanks a lot for cleaning up and aligning all these classes. This is helping us reduce our tech debt a lot.
Left a minor review to add tests for both code paths of removing and adding these ids.
7a794a7 to
e905655
Compare
Signed-off-by: Praateek <praateekm@gmail.com>
Maghoumi
left a comment
There was a problem hiding this comment.
Reviewed the tutorial changes. Look good to me.
| config=semdedup_config, | ||
| input_column="text", | ||
| id_column="id", | ||
| perform_removal=True, |
There was a problem hiding this comment.
Oh that's very neat! Was this recently added to the SemDedup API?
Also, do other dedups offer a similar feature?
There was a problem hiding this comment.
This is being added to SemDedup in this PR itself, and after this PR all dedups will offer perform_removal (see #516)
Description
Closes #516
Usage
# Add snippet demonstrating usageChecklist