Skip to content

trekar99/musyn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Musyn: A Real-time Music-to-Image Co-creation System

Overview

MUSYN is an innovative system designed for real-time musical co-creation, focusing on generating visual art from music. This project proposes a modular workflow that integrates various phases to achieve this transformation: from audio capture and preprocessing, through the transformation of music into textual descriptions, to image synthesis. This flexible and adaptable architectural design is intended to allow the integration of specific artificial intelligence models within each of its parts, depending on needs and technological advancements. By establishing a direct and reactive connection between sound and image, MUSYN goes beyond the traditional concept of synesthesia, providing a platform to explore and materialize a new form of interactive visual art, where music becomes the creative engine for visual expression.


Installation

./start.sh
  • Installs all Python dependencies (see setup.py).
  • Downloads the required music captioning and image generation models.

Usage

  1. Run the app
    python app.py
  2. Open the web interface
    The app launches a Gradio interface in your browser.
  3. Choose your mode
    • Live Audio: Use your microphone to generate images from live music.
    • File Audio: Upload an audio file for processing.
  4. Interact
    • View generated captions and images in real time.
    • Adjust image width/height and use example prompts.

Musyn Architecture

  • Audio Preprocessing:
    Audio is captured (live/file) and preprocessed using utilities in utils/audio_utils.py.
  • Music Captioning:
    The audio is passed to a music captioning model (model/music2txt.py), which uses a BART-based architecture to generate descriptive text.
  • Image Generation:
    The caption is fed into a Stable Diffusion XL Turbo pipeline (model/txt2img.py) to generate an image.
  • Web Interface:
    The Gradio app (app.py) provides a user-friendly interface for real-time interaction.

Code Structure

musyn/
├── app.py                # Main Gradio web application
├── config.py             # UI and model configuration
├── setup.py              # Python package setup and dependencies
├── start.sh              # Installation and model download script
├── utils/
│   └── audio_utils.py    # Audio loading and preprocessing utilities
├── model/
│   ├── bart.py           # BART-based captioning model definition
│   ├── modules.py        # Audio encoder and feature extraction modules
│   ├── music2txt.py      # Music-to-text (captioning) pipeline
│   └── txt2img.py        # Text-to-image (Stable Diffusion XL Turbo) pipeline
├── model/models/         # Downloaded model weights (auto-created)
├── LICENSE
└── README.md

References


Next Steps

  • Generate Images in RT.
  • Option of 5s audio input.
  • Improve RT Image Generation.
  • Publish a demo on a Hugging Face Space

License:
GNU GPLv3

About

A Real-time Music-to-Image Co-creation System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published