AutoCaptionGenAI

This project extracts audio from video files, transcribes the speech using Google's Speech Recognition API, translates it into a target language (default: English), and generates subtitles in SRT format. The project processes audio chunks with a slight overlap to improve transcription accuracy, then translates and writes each chunk's transcription into an SRT file with timestamps.

Features

Audio extraction from video files.
Speech-to-text transcription using the speech_recognition library.
Translation of transcribed text into a target language using googletrans.
SRT subtitle file generation with proper timing.
Progress tracking using tqdm while processing audio chunks.
Configurable source and destination languages for transcription and translation.

Installation

To run this project, you need to have Python 3 installed along with some required libraries.

Clone the repository:

git clone https://github.com/NormVg/AutoCaptionGenAI
cd AutoCaptionGenAI

Install required dependencies:
```
pip install -r requirements.txt
```
Required libraries include:
- pydub
- speech_recognition
- googletrans
- tqdm
- moviepy
You can install them manually if needed:
```
pip install pydub speechrecognition googletrans==4.0.0-rc1 tqdm moviepy
```

Additional Dependencies

Ensure that ffmpeg or libav is installed in your system to enable audio and video processing via moviepy. You can install it using:

On Ubuntu:

sudo apt update && sudo apt install ffmpeg

On MacOS using Homebrew:
```
brew install ffmpeg
```

Usage

Command-Line

Run the program by passing a video file path as an argument:
```
python main.py path_to_video.mp4
```
This will extract the audio, process it, generate transcriptions, translate the text, and output an SRT file (caption.srt by default).

SRT Output

By default, the SRT file is saved as caption.srt. The file will contain timestamps and translated subtitles for the audio extracted from the video.

Example

Here is an example of how to run the script:

python main.py my_video.mp4

Expected Output

Audio will be extracted from my_video.mp4 into output_audio.wav.
SRT file (caption.srt) will be generated containing the transcriptions and translations.

The structure of an SRT file:

1
00:00:00,000 --> 00:00:08,000
hello, welcome to this demo

2
00:00:08,000 --> 00:00:16,000
how are you today?

Customization

1. Change the Chunk Duration

You can adjust the length of each audio chunk being processed by modifying the chunk_duration_ms and overlap_duration_ms in the split_audio function.

chunk_duration_ms = 15000  # 15 seconds in milliseconds
overlap_duration_ms = 2000  # 2 seconds of overlap

2. Modify Source and Target Language

By default, the source language is set to Hindi (hi) and the target language to English (en). You can change these in the Audio2SrtFile function call:

Audio2SrtFile("output_audio.wav", srcLang="fr", distLang="es")

This example will transcribe French audio and translate it into Spanish.

3. Change the Output SRT File Name

You can change the name of the output SRT file by passing a different srtFile parameter:

Audio2SrtFile("output_audio.wav", srcLang="hi", distLang="en", srtFile="my_custom_caption.srt")

4. Tweak Speech Recognition Language

The speech recognition language can be modified in the recognize_speech_from_wav function by changing the lang argument:

recognize_speech_from_wav("output_directory/some_audio.wav", lang='fr-FR')  # Recognize French

Project Structure

.
├── main.py                # Main script for extracting audio, recognizing speech, translating, and generating SRT
├── requirements.txt       # Python dependencies
├── README.md              # Project documentation
└── output_directory/      # Directory for storing intermediate audio chunks

To-Do / Future Enhancements

Parallel processing of audio chunks to improve performance.
Error handling for cases when speech recognition or translation fails.
Add more translation services for greater flexibility.

License

This project is licensed under the MIT License. See the LICENSE file for details.


### Key Sections:
- **Installation**: Explains how to install the project dependencies.
- **Usage**: Guides how to run the script via the command line.
- **Example**: Demonstrates the expected output.
- **Customization**: Allows users to tweak audio chunking, language, and output file configurations.
- **Future Enhancements**: Provides ideas for contributors.

Feel free to modify any section as per your specific requirements!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoCaptionGenAI

Features

Table of Contents

Installation

Additional Dependencies

Usage

Command-Line

SRT Output

Example

Expected Output

Customization

1. Change the Chunk Duration

2. Modify Source and Target Language

3. Change the Output SRT File Name

4. Tweak Speech Recognition Language

Project Structure

To-Do / Future Enhancements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
video2audio.py		video2audio.py

NormVg/AutoCaptionGenAI

Folders and files

Latest commit

History

Repository files navigation

AutoCaptionGenAI

Features

Table of Contents

Installation

Additional Dependencies

Usage

Command-Line

SRT Output

Example

Expected Output

Customization

1. Change the Chunk Duration

2. Modify Source and Target Language

3. Change the Output SRT File Name

4. Tweak Speech Recognition Language

Project Structure

To-Do / Future Enhancements

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages