Skip to content

Slice is a python program to automatically cut audio files into smaller duration clips. The program automatically avoids cutting in-between words to get clips with whole sentences, better for training Natural Language Processing models.

Notifications You must be signed in to change notification settings

divij-pawar/slice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Slice: Audio Segmentation for NLP Datasets

Slice is a command-line utility designed to automatically segment long audio recordings into smaller clips based on voice activity.

It prepares raw audio for Natural Language Processing (NLP) and Speech-to-Text (STT) model training (like Whisper, Kaldi, or Wav2Vec2) by ensuring clips contain distinct speech segments and creating standard metadata manifests.

Features

  • Robust Voice Activity Detection (VAD): Uses WebRTC VAD to distinguish human speech from background noise, which is more accurate than simple energy-based silence detection.
  • Automatic Preprocessing: Automatically converts audio to 16kHz Mono (16-bit), the standard format required by most ASR models.
  • Metadata Generation: Outputs a manifest.jsonl file containing filenames and durations alongside the audio clips.
  • CLI Support: Fully automatable via command line arguments.
  • Batch Processing: Process single files or entire directories of audio at once.
  • Dry Run Mode: Preview how audio will be split without writing files.

Installation

  1. Clone the repository

    git clone [https://github.com/divij-pawar/slice.git](https://github.com/divij-pawar/slice.git)
    cd slice
  2. Install Python dependencies

    pip install -r requirements.txt
  3. Install FFmpeg (Required) Slice relies on pydub to load audio files, which requires FFmpeg.

    • Mac: brew install ffmpeg
    • Linux: sudo apt-get install ffmpeg
    • Windows: Download FFmpeg and add it to your PATH.

Usage

Basic Command

Slice a single audio file using default settings. This will create a folder containing .wav clips and a manifest.jsonl.

python slice.py audio/interview.wav

Batch Process a Folder

Process every .wav or .mp3 file in a folder:

python slice.py data/raw_recordings --output data/processed_dataset

The "Dry Run" (Safe Mode)

Unsure about your settings? Use --dry-run to see the split timestamps without creating files:

python slice.py audio/interview.wav --dry-run

Configuration / Arguments

You can tune the VAD sensitivity to fit different microphone qualities or background noise levels.

Argument Default Description
input_path Required Path to a file or directory.
--output sliced_audio Directory to save the result clips and manifest.
--aggressiveness 2 VAD aggressiveness level (0-3). 3 is the most strict at filtering non-speech.
--padding 300 Milliseconds of silence allowed around speech chunks. Higher values keep words from being cut off.
--min-duration 1.0 Minimum duration (in seconds) for a clip to be kept. Useful for filtering clicks/coughs.
--dry-run False If set, prints stats but does not save files.
--verbose False Prints detailed processing info for every clip saved.

Examples

Noisy Audio: If the audio has significant background noise, increase the aggressiveness to strictly detect human voice:

python slice.py podcast.wav --aggressiveness 3

Keep Short Utterances: To keep very short responses (like "Yes" or "No"), reduce the minimum duration:

python slice.py speech.wav --min-duration 0.5

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Distributed under the MIT License. See LICENSE for more information.

About

Slice is a python program to automatically cut audio files into smaller duration clips. The program automatically avoids cutting in-between words to get clips with whole sentences, better for training Natural Language Processing models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages