Thank you for checking out the Multimodal-Large-Language-Model project. Please note that this project was created for research purposes.
For a more robust and well-developed solution, you may consider using open-webui/open-webui with ollama/ollama.
You can access the project documentation at [GitHub Pages].
- Docker: [Installation Guide]
- Docker Compose: [Installation Guide]
- Compatibile with Linux and Windows Host
- Ensure port 8501 and 11434 are not already in use
- You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. [Source]
- Project can be ran on either CPU or GPU
- NVIDIA Container Toolkit (Linux) [Installation Guide]
- NVIDIA CUDA Toolkit (Windows) [Installation]
- WSL (Windows) [Installation]
Model Name | Size | Link |
---|---|---|
llava:7b | 4.7GB | Link |
llava:34b | 20GB | Link |
Llava is pulled and loaded by default, other models from Ollama can be added into ollama/ollama-build.sh
Note
Project will run on GPU by default. To run on CPU, use the docker-compose.cpu.yml
instead
- Clone this repository and navigate to project folder
git clone https://github.com/NotYuSheng/Multimodal-Large-Language-Model.git
cd Multimodal-Large-Language-Model
- Build the Docker images:
docker compose build
- Run images
docker compose up -d
- Access Streamlit webpage from host
<host-ip>:8501
API calls to Ollama server can be made to
<host-ip>:11434
Thank you for showing your support by starring this project. Your recognition is greatly appreciated! ⭐