Powerful tool designed to clean and preprocess plaintext files; Remove non-numeric/alphabetic/punctuational characters, with the ability to collapse repeated punctuations.
nlp sanitization machine-learning natural-language-processing automation regex data-transformation data-analysis mit-license command-line-tool text-processing data-preprocessing regular-expressions plaintext data-cleaning numeric-data file-manipulation punctuation-handling machine-learing-preprocessing
-
Updated
Jan 31, 2024 - Rust