Skip to content

Prompt2Clip is a cloud-based application for generating high-quality videos from textual prompts using the Mochi video generation model. This project demonstrates the integration of advanced AI models with scalable cloud infrastructure, enabling the creation of visually compelling video clips directly from descriptive text inputs.

Notifications You must be signed in to change notification settings

MansurPro/Prompt2Clip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎥 Prompt2Clip

Prompt2Clip generates high-quality videos from text prompts using YOLOv10 models. Designed for efficiency and flexibility, it fine-tunes object detection models on custom datasets and enables real-time inference for a seamless text-to-video experience.


🚀 Key Features

  • Text-to-Video Conversion: Converts natural language prompts into video clips.
  • Custom Dataset Training: Fine-tunes YOLOv10 models with bird and bee datasets for enhanced detection.
  • Real-Time Inference: Supports single-image and streaming video detection.
  • Cloud-Based Workflow: Uses Google Colab for GPU-accelerated training and processing.
  • Customizable Parameters: Flexible settings for model size, inference steps, and detection thresholds.

🛠️ How It Works

Prompt2Clip combines advanced AI models and custom workflows:

  1. Dataset Integration: Downloads datasets from Roboflow for custom object detection tasks.
  2. Model Training: Fine-tunes YOLOv10 on labeled datasets for accurate detection.
  3. Video Generation: Combines frames generated through detection into cohesive video clips.
  4. Inference Pipelines: Enables real-time detection on single images or streaming video.

📂 Project Structure

Prompt2Clip/
│
├── datasets/           # Custom datasets for birds and bees
├── models/             # Pre-trained and fine-tuned YOLOv10 models
├── scripts/            # Scripts for training, inference, and video generation
├── examples/           # Example outputs of text-to-video generation
└── README.md           # Project documentation

🖥️ Usage

1. Clone the Repository

git clone https://github.com/MansurPro/Prompt2Clip.git
cd Prompt2Clip

2. Set Up the Environment

Run Prompt2Clip in Google Colab for GPU-accelerated operations. Install the required Python packages:

pip install -r requirements.txt

3. Train the Model

python train.py --dataset datasets/birds --model yolov10m.pt --epochs 10

4. Run Inference

python inference.py --image_path path/to/image.jpg --model_path models/yolov10_best.pt

🎨 Examples

Prompt Generated Video
"A bird flying over a forest" View
"A bee hovering near a flower" View

📊 Performance

  • Efficiency: Fine-tuned for fast and accurate text-to-video generation.
  • Customizability: Supports flexible detection thresholds and model configurations.
  • Scalability: Leverages GPU resources for high-throughput operations.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.


🙌 Acknowledgments

Prompt2Clip builds on the following open-source tools and datasets:

  • YOLOv10 for object detection.
  • Roboflow for dataset integration.
  • Google Colab for cloud-based GPU acceleration.

Thank you to the open-source community for enabling innovative solutions like this!

About

Prompt2Clip is a cloud-based application for generating high-quality videos from textual prompts using the Mochi video generation model. This project demonstrates the integration of advanced AI models with scalable cloud infrastructure, enabling the creation of visually compelling video clips directly from descriptive text inputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published