This app uses default logic for generating audio based on user prompts from Riffusion Hobby.
You can test in 2 deployment links: https://manipulation-video-v1.streamlit.app/ and https://manipulation-video-v2.streamlit.app/
The Video Manipulation App is designed to help users split a video into multiple parts, generate custom audio based on user input, and attach the generated audio to a selected video part.
-
Clone the Repository: Open a terminal and run the following command to clone this repository:
git clone https://github.com/vicnesterenko/video-manipulation.git
-
Navigate to the Directory: Change to the project directory:
cd video-manipulation
-
Create Virtual Environment: Create and activate a virtual environment:
python -m venv your_venv_name source your_venv_name/bin/activate # For Windows use `your_venv_name\Scripts\activate`
-
Install Dependencies: Install the required dependencies using pip:
pip install -r requirements.txt
-
Docker Setup: Build and run the Docker container:
docker build -t video-manipulation-app . docker run -p 8501:8501 video-manipulation-app
Here provided 2 types of UI:
- In one page with down scrolling:
app.py
- In four pages with possibility to start over after last action:
app1.py
To run the application execute the following command:
streamlit run app.py
The application provides a Streamlit interface with the following features:
- Upload a Video: The user uploads a video file in formats like MP4, MOV, or AVI.
- Split Video: The user specifies the number of parts to split the video into and the number of columns for displaying the video clips. The app then splits the video accordingly.
- Generate and Add Audio: The user inputs a prompt for audio generation, along with other parameters like negative prompt, seeds, and number of inference steps. The app generates the audio and attaches it to the selected video part.
- Download: The user can download an archive containing all the video parts and their corresponding audios.
-
✨ Better Performance with GPU: It is recommended to run the app with a GPU for better performance. To understand, if cude available in your PC you can write this code in python console:
import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(device)
-
✨ Spectrogram Length Calculation: The length of the spectrogram is determined using the formula:
generating_audio_duration = width * hop_length / sample_rate
However, due to potential inaccuracies in training, an additional length of 320 is added because it is divisible by 8, ensuring extra generated seconds that can be trimmed using the
add_audio_to_video
method fromriffusion.streamlit.tasks.video_processing
. -
✨ Fine-Tune Audio Generation Parameters: For more personalized audio output, you can fine-tune the audio generation parameters such as
prompt
,seeds
, andnumber of inference steps
. This allows you to experiment with different settings to achieve the desired audio effects and quality. Here’s a brief overview of these parameters:- Prompt: This helps in guiding the audio generation process by specifying what you do not want in the audio.
- Seeds: Seeds control the randomness in the generation process, ensuring reproducible results if needed.
- Number of Inference Steps: Adjusting this parameter can balance the quality and speed of audio generation. More steps usually mean better quality but longer processing time.
By customizing these settings, you can create unique audio tracks tailored to your specific needs and preferences.
-
✨ Output Directory and Archive Naming: The output videos are saved in the output folder, with each part named using a part number and a unique UUID. The archive of the files is named in a user-friendly format with a timestamp, making it easy to identify and manage.