PaddleMIX is a large multi-modal development kit based on PaddlePaddle, which aggregates multiple functions such as images, texts, and videos, and covers a variety of multi-modal tasks such as visual language pre-training, textual images, and textual videos. It provides an out-of-the-box development experience while meeting developers’ flexible customization needs and exploring general artificial intelligence.
2024.04.17
- PPDiffusers published version 0.24.0, it supports DiT and other Sora-related technologies. Supporting SVD and other video generation models
2023.10.7
- Published PaddleMIX version 1.0
- Newly added distributed training capability for image-text pre-training models. BLIP-2 supports training on scales up to one hundred billion parameters.
- Newly added cross-modal application pipeline AppFlow, which supports automatic annotation, image editing, sound-to-image, and 11 other cross-modal applications with just one click.
- PPDiffusers has released version 0.19.3, introducing SDXL and related tasks.
2023.7.31
- Published PaddleMIX version 0.1
- The PaddleMIX large multi-modal model development toolkit is released for the first time, integrating the PPDiffusers multi-modal diffusion model toolbox and widely supporting the PaddleNLP large-language models.
- Added 12 new large multi-modal models including EVA-CLIP, BLIP-2, miniGPT-4, Stable Diffusion, ControlNet, etc.
- Rich Multi-Modal Functionality: Encompassing image-text pre-training, text-to-image, multi-modal visual tasks, enabling diverse functions like image editing, image description, data annotation, and more.
- Simplified Development Experience: Unified model development interface facilitating efficient custom model development and feature implementation.
- Efficient Training and Inference Workflow: Streamlined end-to-end development process for training and inference, with standout performance in training and inference for key models such as BLIP-2, Stable Diffusion, etc., leading the industry.
- Support for Ultra-Large Scale Training: Capable of training models up to the scale of hundreds of billions for image-text pre-training, and base models up to the scale of tens of billions for text-to-image.
- video Demo
PaddleMix.mp4
- Environment Dependencies
pip install -r requirements.txt
Detailed installation tutorials for PaddlePaddle
Note: parts of Some models in ppdiffusers require CUDA 11.2 or higher. If your local machine does not meet the requirements, it is recommended to go to AI Studio for model training and inference tasks.
If you wish to train and infer using bf16, please use a GPU that supports bf16, such as the A100.
- Manual Installation
git clone https://github.com/PaddlePaddle/PaddleMIX
cd PaddleMIX
pip install -e .
#ppdiffusers 安装
cd ppdiffusers
pip install -e .
- Artistic Style QR Code Model
Try it out: https://aistudio.baidu.com/community/app/1339
- Image Mixing
Try it out: https://aistudio.baidu.com/community/app/1340
Multi-modal Pre-training | Diffusion-based Models |
|
|
For more information on additional model capabilities, please refer to the Model Capability Matrix.
This repository is licensed under the Apache 2.0 license