A modular utility toolkit for managing datasets in machine learning and AI applications
dataset-tools
is a modular utility toolkit for managing datasets in machine learning and AI applications. It simplifies common tasks such as:
- Converting labels between formats (e.g., YOLO TXT.).
- Visualizing bounding boxes on images from annotation files.
- Renaming and organizing dataset files.
Tech
used in this repository:
isort
andblack
keep the code consistent and clean.OpenCV
handles computer vision tasks.python-dotenv
handles any settings in project.
Follow these steps to set up the project locally:
-
Clone the repository:
git clone https://github.com/Dhaboav/dataset-tools.git
-
Install Python dependencies:
Install the required Python packages using
pip
:pip install -r requirements.txt
-
Create .env file:
Copy .env.example using
cmd prompt
and change its values:copy .env.example .env
If you want to see how to use specific functions, navigate to the examples
folder:
draw_bboxes_demo.py
: Demonstrates how to draw bounding boxes from YOLO TXT format annotations onto images using OpenCV and save the resulting images.rename_demo.py
: Demonstrates how to renaming files.split_dataset_demo.py
: Demonstrates how to split datasets into training, validation, and test sets with proportions of 70%, 20%, and 10%, respectively.
To fix module import errors in the examples
folder, set PYTHONPATH
in your VSCode settings so Python recognizes the project's root directory.
-
Open
Preferences: Open User Settings (JSON)
viaCtrl+Shift+P
in VSCode. -
Add this configuration:
{ "terminal.integrated.env.windows": { "PYTHONPATH": "${workspaceFolder}" } }