Skip to content

Actions: ayushdg/NeMo-Curator

Test Python package

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
29 workflow runs
29 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Improvements for semantic deduplication and DAPT tutorial (#564)
Test Python package #29: Commit 119edd4 pushed by ayushdg
February 24, 2025 22:23 6m 15s main
February 24, 2025 22:23 6m 15s
Update get_all_files_paths_under examples to include `keep_extensio…
Test Python package #28: Commit 9b1a13c pushed by ayushdg
February 21, 2025 20:32 5m 49s main
February 21, 2025 20:32 5m 49s
Update fuzzy deduplication section of tutorials to skip false positiv…
Test Python package #27: Commit c4cb682 pushed by ayushdg
February 11, 2025 23:40 5m 11s main
February 11, 2025 23:40 5m 11s
Update fuzzy deduplication to skip false positive checks as the defau…
Test Python package #26: Commit fe41ac1 pushed by ayushdg
January 30, 2025 19:13 5m 33s main
January 30, 2025 19:13 5m 33s
Create notebook tutorials for distributed data classifiers (#415)
Test Python package #25: Commit cd38de0 pushed by ayushdg
January 27, 2025 18:59 5m 3s main
January 27, 2025 18:59 5m 3s
Add Python 3.10 to unit test matrix (#496)
Test Python package #24: Commit d31c29f pushed by ayushdg
January 23, 2025 21:38 7m 26s main
January 23, 2025 21:38 7m 26s
[REVIEW] Fix Sem Dedup (#478)
Test Python package #23: Commit 7cfda44 pushed by ayushdg
January 14, 2025 19:46 4m 50s main
January 14, 2025 19:46 4m 50s
Clean up internal column logic in _run_classifier_helper function (…
Test Python package #22: Commit 694970a pushed by ayushdg
January 6, 2025 20:50 4m 32s main
January 6, 2025 20:50 4m 32s
[REVIEW] Speedup Connected Components (#302)
Test Python package #21: Commit 36fcf50 pushed by ayushdg
October 30, 2024 18:47 5m 19s main
October 30, 2024 18:47 5m 19s
Write to file without including "filename" column (#317)
Test Python package #20: Commit 7d7767b pushed by ayushdg
October 23, 2024 23:49 5m 12s main
October 23, 2024 23:49 5m 12s
Fix enabling spilling by enabling it on client process (#275)
Test Python package #19: Commit d9c414b pushed by ayushdg
October 3, 2024 18:43 5m 11s main
October 3, 2024 18:43 5m 11s
Enabled nightly build using RAPIDS nightly (#237)
Test Python package #18: Commit c89c115 pushed by ayushdg
September 19, 2024 21:11 5m 10s main
September 19, 2024 21:11 5m 10s
Add option to skip false positive checks during Fuzzy Deduplication (…
Test Python package #17: Commit 982e7ec pushed by ayushdg
September 6, 2024 21:24 9m 38s main
September 6, 2024 21:24 9m 38s
Change combinations() to pairwise() when constructing a list of edges in _BucketsToEdges
Test Python package #16: Pull request #2 opened by yury-tokpanov
September 3, 2024 18:50 6m 3s patch-1
September 3, 2024 18:50 6m 3s
Fix a few bugs in fuzzy dedup and docs (#156)
Test Python package #15: Commit e654281 pushed by ayushdg
July 30, 2024 00:28 5m 27s main
July 30, 2024 00:28 5m 27s
Enable Sem-dedup (#130)
Test Python package #14: Commit e557ee3 pushed by ayushdg
July 8, 2024 19:40 5m 15s main
July 8, 2024 19:40 5m 15s
Fix #116. Fix task-decontamination broken links (#117)
Test Python package #13: Commit 462b964 pushed by ayushdg
June 18, 2024 22:16 6m 31s main
June 18, 2024 22:16 6m 31s
Update index.rst (#113)
Test Python package #12: Commit f1e993b pushed by ayushdg
June 14, 2024 00:03 5m 49s main
June 14, 2024 00:03 5m 49s
Applying SEO Best Pratices (#104)
Test Python package #11: Commit 38b0ac1 pushed by ayushdg
June 12, 2024 19:57 5m 33s main
June 12, 2024 19:57 5m 33s
Update readme (#93)
Test Python package #10: Commit e814736 pushed by ayushdg
June 3, 2024 17:44 5m 28s main
June 3, 2024 17:44 5m 28s
Fuzzy Dedup: Use text_field instead of hardcoded text column (#74)
Test Python package #9: Commit 8755cdc pushed by ayushdg
May 22, 2024 23:18 5m 21s main
May 22, 2024 23:18 5m 21s
Remove argparse from get_client function signature (#12)
Test Python package #8: Commit 5e46cd8 pushed by ayushdg
May 22, 2024 21:49 5m 18s main
May 22, 2024 21:49 5m 18s
Align extract_partitioning_index logic with upstream shuffling (#60)
Test Python package #7: Commit ecd4f4b pushed by ayushdg
May 15, 2024 20:46 5m 12s main
May 15, 2024 20:46 5m 12s
[Tutorials] Add a tutorial for PEFT data curation (#45)
Test Python package #6: Commit 06ee061 pushed by ayushdg
May 10, 2024 18:33 7m 59s main
May 10, 2024 18:33 7m 59s
Fix issue #43 (empty files creation) and improve reading/writing spee…
Test Python package #5: Commit 72b9775 pushed by ayushdg
May 9, 2024 17:09 5m 26s main
May 9, 2024 17:09 5m 26s