GitHub - trekar99/musyn: A Real-time Music-to-Image Co-creation System

Musyn: A Real-time Music-to-Image Co-creation System

Overview

MUSYN is an innovative system designed for real-time musical co-creation, focusing on generating visual art from music. This project proposes a modular workflow that integrates various phases to achieve this transformation: from audio capture and preprocessing, through the transformation of music into textual descriptions, to image synthesis. This flexible and adaptable architectural design is intended to allow the integration of specific artificial intelligence models within each of its parts, depending on needs and technological advancements. By establishing a direct and reactive connection between sound and image, MUSYN goes beyond the traditional concept of synesthesia, providing a platform to explore and materialize a new form of interactive visual art, where music becomes the creative engine for visual expression.

Installation

./start.sh

Installs all Python dependencies (see setup.py).
Downloads the required music captioning and image generation models.

Usage

Run the app
```
python app.py
```
Open the web interface
The app launches a Gradio interface in your browser.
Choose your mode
- Live Audio: Use your microphone to generate images from live music.
- File Audio: Upload an audio file for processing.
Interact
- View generated captions and images in real time.
- Adjust image width/height and use example prompts.

Musyn Architecture

Audio Preprocessing:
Audio is captured (live/file) and preprocessed using utilities in utils/audio_utils.py.
Music Captioning:
The audio is passed to a music captioning model (model/music2txt.py), which uses a BART-based architecture to generate descriptive text.
Image Generation:
The caption is fed into a Stable Diffusion XL Turbo pipeline (model/txt2img.py) to generate an image.
Web Interface:
The Gradio app (app.py) provides a user-friendly interface for real-time interaction.

Code Structure

musyn/
├── app.py                # Main Gradio web application
├── config.py             # UI and model configuration
├── setup.py              # Python package setup and dependencies
├── start.sh              # Installation and model download script
├── utils/
│   └── audio_utils.py    # Audio loading and preprocessing utilities
├── model/
│   ├── bart.py           # BART-based captioning model definition
│   ├── modules.py        # Audio encoder and feature extraction modules
│   ├── music2txt.py      # Music-to-text (captioning) pipeline
│   └── txt2img.py        # Text-to-image (Stable Diffusion XL Turbo) pipeline
├── model/models/         # Downloaded model weights (auto-created)
├── LICENSE
└── README.md

References

Next Steps

Generate Images in RT.
Option of 5s audio input.
Improve RT Image Generation.
Publish a demo on a Hugging Face Space

License:
GNU GPLv3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Musyn: A Real-time Music-to-Image Co-creation System

Overview

Installation

Usage

Musyn Architecture

Code Structure

References

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
favicon.png		favicon.png
setup.py		setup.py
start.sh		start.sh

License

trekar99/musyn

Folders and files

Latest commit

History

Repository files navigation

Musyn: A Real-time Music-to-Image Co-creation System

Overview

Installation

Usage

Musyn Architecture

Code Structure

References

Next Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages