Python library for handling audio datasets.
-
Updated
Jul 6, 2023 - Python
Python library for handling audio datasets.
Trainable categorization tool
💧 In memory dataset filtering
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
Cleaning discord data for NLP
Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library
🚀 Whenever you need to look through huge pile of images and cannot use force of file explorer, or you just work on a remote headless machine, you can use this tool. It also allows to move files from one folder to another, creating destination if it does not exist. Work in progress.
[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
Face recognition approach by exploring information jointly in space, scale and orientation.
A simple library that wraps common data processing tasks into an easy to use preprocessing engine. The library currently supports transformation of csv files loaded into Pandas dataframe.
Image Filter Tool V2 is a powerful web-based application designed to streamline the process of filtering, categorizing, and managing large image datasets manually.
A set of tools to generate and label dataset from academic papers
Module of the Open City Toolkit to visualize use of open datasets by applications:
Fast Spark Expression - Write column expressions quickly and easily like a string
Compare pictures, keep 2
Data Cleaning - A project which takes all colleges in the US, and narrows down the suitable colleges by slicing, dicing and concatenating startup activity data and crime statistics.
Add a description, image, and links to the dataset-filtering topic page so that developers can more easily learn about it.
To associate your repository with the dataset-filtering topic, visit your repo's landing page and select "manage topics."