Skip to content

Latest commit

 

History

History
166 lines (134 loc) · 7 KB

README_EN.md

File metadata and controls

166 lines (134 loc) · 7 KB



中文文档

Introduction

PaddleMIX is a large multi-modal development kit based on PaddlePaddle, which aggregates multiple functions such as images, texts, and videos, and covers a variety of multi-modal tasks such as visual language pre-training, textual images, and textual videos. It provides an out-of-the-box development experience while meeting developers’ flexible customization needs and exploring general artificial intelligence.

Updates

2024.04.17

  • PPDiffusers published version 0.24.0, it supports DiT and other Sora-related technologies. Supporting SVD and other video generation models

2023.10.7

  • Published PaddleMIX version 1.0
  • Newly added distributed training capability for image-text pre-training models. BLIP-2 supports training on scales up to one hundred billion parameters.
  • Newly added cross-modal application pipeline AppFlow, which supports automatic annotation, image editing, sound-to-image, and 11 other cross-modal applications with just one click.
  • PPDiffusers has released version 0.19.3, introducing SDXL and related tasks.

2023.7.31

  • Published PaddleMIX version 0.1
  • The PaddleMIX large multi-modal model development toolkit is released for the first time, integrating the PPDiffusers multi-modal diffusion model toolbox and widely supporting the PaddleNLP large-language models.
  • Added 12 new large multi-modal models including EVA-CLIP, BLIP-2, miniGPT-4, Stable Diffusion, ControlNet, etc.

Main Features

  • Rich Multi-Modal Functionality: Encompassing image-text pre-training, text-to-image, multi-modal visual tasks, enabling diverse functions like image editing, image description, data annotation, and more.
  • Simplified Development Experience: Unified model development interface facilitating efficient custom model development and feature implementation.
  • Efficient Training and Inference Workflow: Streamlined end-to-end development process for training and inference, with standout performance in training and inference for key models such as BLIP-2, Stable Diffusion, etc., leading the industry.
  • Support for Ultra-Large Scale Training: Capable of training models up to the scale of hundreds of billions for image-text pre-training, and base models up to the scale of tens of billions for text-to-image.

Demo

  • video Demo
PaddleMix.mp4

Installation

  1. Environment Dependencies
pip install -r requirements.txt

Detailed installation tutorials for PaddlePaddle

Note: parts of Some models in ppdiffusers require CUDA 11.2 or higher. If your local machine does not meet the requirements, it is recommended to go to AI Studio for model training and inference tasks.

If you wish to train and infer using bf16, please use a GPU that supports bf16, such as the A100.

  1. Manual Installation
git clone https://github.com/PaddlePaddle/PaddleMIX
cd PaddleMIX
pip install -e .

#ppdiffusers 安装
cd ppdiffusers
pip install -e .

Tutorial

Specialized Applications

  1. Artistic Style QR Code Model
  1. Image Mixing

Datasets

Multi-modal Pre-training Diffusion-based Models
  • Image-Text Pre-training
  • Open World Vision Models
  • More Multi-Modal Pre-trained Models
  • Text-to-Image
  • Text-to-Video
  • Audio Generation
  • For more information on additional model capabilities, please refer to the Model Capability Matrix.

    LICENSE

    This repository is licensed under the Apache 2.0 license