Skip to content

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Notifications You must be signed in to change notification settings

Kind-Unes/MultiModal-Model

Repository files navigation

Project Name

Multi-Modal Model Python Project

Overview

This project is a multi-modal model that accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Features

  • Streamlit Interface : Coming Soon
  • Input Modalities: Audio, Images, Text, videos , emojis, multi inputs
  • Output Modalities: Audio, Images, Text, Videos , emojis , segmented images, images objects detection coordinates, multi outputs

Getting Started

Prerequisites

  • Python 3.x
  • Dependencies listed in requirements.txt

Installation

git clone https://github.com/Kind-Unes/Multi-Model-V1.git
cd 'MultiMODEL Template'
pip install -r requirements.txt

Usage

python model.py

Credits

TXT2IMG Models

Text Generation Model

IMG2TXT Model

TTS Model

STT Model

Others . . . . .

Websites

About

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages