YouTube Video to Text Markdown Converter

yt-video-text-md is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the youtube-transcript-api for direct subtitle extraction and whisper for audio-to-text conversion when transcripts are unavailable.

Features

Playlist and Video Support: Extracts subtitles from both individual videos and entire playlists.
Fallback Mechanism: Utilizes whisper to transcribe audio if subtitles are not available.
Markdown Formatting: Outputs transcripts in Markdown format with video titles as headers.

Installation

Via pip

To install the latest version directly from the GitHub repository, use:

pip install yt-video-text-md

Or

pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git

Building from Source

Clone the repository:

git clone https://github.com/kothiyarajesh/yt-video-text-md.git

Navigate to the project directory:
```
cd yt-video-text-md
```
Install the package:
```
python setup.py install
```

Usage

Python Script

Here's a simple example of how to use the yt-video-text-md library in a Python script:

from yt_video_text_md import YTVideoTextMD

# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"

# Specify the directory where the output Markdown file will be saved
output_directory = "."

# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"

# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"

# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
    url=video_url,
    output_dir=output_directory,
    default_md_file_name=markdown_file_name,
    audio_output_dir=temporary_audio_directory
)

Command-Line Interface

You can also use the package from the command line:

yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"

Options:

-u or --url: URL of the YouTube video or playlist.
-d or --output-dir: Directory where the output Markdown file will be saved.
-f or --file-name: Name for the generated Markdown file.
-ad or --audio-dir: Directory where temporary audio files will be stored (used only if a transcript is not available).

Notes

Dependencies: This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
Audio Extraction: If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
examples		examples
tests		tests
yt_video_text_md		yt_video_text_md
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Video to Text Markdown Converter

Features

Installation

Via pip

Building from Source

Usage

Python Script

Command-Line Interface

Notes

License

About

Releases

Packages

Contributors 2

Languages

License

kothiyarajesh/yt-video-text-md

Folders and files

Latest commit

History

Repository files navigation

YouTube Video to Text Markdown Converter

Features

Installation

Via pip

Building from Source

Usage

Python Script

Command-Line Interface

Notes

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages