Automated conversion of any Huggingface model to multiple GGUF LLMs quantization formats
Supports continuous monitoring, auto-detection, and universal deployment modes
Universal GGUF LLMs Converter is a production-ready, Docker-based solution for automatically converting HuggingFace models to GGUF format with multiple quantization types. Built with llama.cpp integration and intelligent tokenizer detection, this tool streamlines the conversion workflow for both personal and community models.
- π Continuous Monitoring: Automatically detects and converts new model updates from HuggingFace repositories
- π€ Auto-Detection: Intelligent tokenizer detection for 50+ popular model architectures (Qwen, Llama, Mistral, Phi, Gemma, etc.)
- π¦ Multiple Quantization: Supports F16, F32, BF16, and all K-quant formats (Q2_K to Q8_0)
- π― Flexible Deploy: Three (3) upload modes - same repository, new repository, or local-only storage
- π§Ή Smart Cleanup: Automatic temporary file management to prevent storage used
- π³ Docker: Fully container with optimized build times and resource usage
- π Progress Tracking: Clean, milestone-based logging with colorized console output
System Requirements:
- Linux-based VPS or local machine
- Docker & Docker Compose installed
- HuggingFace account with WRITE access token
- Sufficient disk space for model downloads and conversion (varies by model size)
gguf-convert-model/
βββ .env
βββ .env.example
βββ .gitignore
βββ .dockerignore
βββ docker-compose.yml
βββ Dockerfile
βββ requirements.txt
βββ README.md
βββ scripts/
β βββ start.sh
βββ src/
β βββ __init__.py
β βββ main.py
β βββ config.py
β βββ utils/
β βββ __init__.py
β βββ logger.py
β βββ helpers.py
βββ logs/ (auto-created)HuggingFace Access Token:
- Visit settings β https://huggingface.co/settings/tokens
- Create a new token with Write permissions
- Copy the token (starts with
hf_)
Install Docker & Compose if not already installed
Instal docker is optional, if you don't have.. try securely
curl -sSL https://raw.githubusercontent.com/arcxteam/succinct-prover/refs/heads/main/docker.sh | sudo bash
git clone https://github.com/arcxteam/gguf-convert-model.git
cd gguf-convert-model
Create edit & save configuration file
cp .env.example .env
nano .env
Example config environment variable
# HF token with WRITE permission
HUGGINGFACE_TOKEN=hf_xxxxxxxx
# Source model repository to convert
+ Example: Qwen/Qwen3-0.6B
REPO_ID=username/model-name
# Use interval in secs
+ Default 0 = only one-time convert, for other commits setup more)
CHECK_INTERVAL=0
# Output formats (comma-separated, no spaces)
# Available: F16,BF16,F32,Q2_K,Q2_K_S,Q3_K_S,Q3_K_M,Q3_K_L,Q4_K_S,Q4_K_M,Q4_K_L,Q5_K_S,Q5_K_M,Q5_K_L,Q6_K,Q8_0
+ Recommended: F16,Q4_K_M,Q5_K_M,Q6_K
QUANT_TYPES=F16,Q3_K_M,Q4_K_M,Q5_K_M,Q6_K
# ========================================
# UPLOAD MODE - Choose ONE option below
# ========================================
# OPTION 1: same_repo
# Upload to the same repository as own source model
+ Use this only YOUR OWN models with WRITE access
UPLOAD_MODE=same_repo
# OPTION 2: new_repo
# TARGET_REPO will be auto-generated as: username/ModelName-GGUF
+ Leave TARGET_REPO empty for auto (recommended)
+ Or manually specify: TARGET_REPO=your-username/custom-name-GGUF
UPLOAD_MODE=new_repo
TARGET_REPO=
# OPTION 3: local_only
+ Save to local directory only (no upload hugging)
+ Files auto-delete after LOCAL_CLEANUP_HOURS
UPLOAD_MODE=local_only
OUTPUT_DIR=./output
# Only set if auto-detection fails (default)
+ Example: Qwen/Qwen3-0.6B
BASE_MODEL_TOKENIZER=
# Output filename pattern (default)
# Placeholders: {model_name} = extracted base name, {quant} = format type
+ Result example: Qwen3-0.6B-Instruct-Q4_K_M.gguf
OUTPUT_PATTERN={model_name}-{quant}.gguf
# Auto-cleanup hours (default)
+ Setup you need local_only mode
LOCAL_CLEANUP_HOURS=24
# Timezone
TZ=Asia/Singapore| ENV Variable | Required? | When to Change | Default if Empty |
|---|---|---|---|
HUGGINGFACE_TOKEN |
β Yes | Always (your token) | ERROR |
REPO_ID |
β Yes | Always (source model) | ERROR |
CHECK_INTERVAL |
Default= 0 or Changes | in secs 3600=1h |
|
QUANT_TYPES |
Change formats needed | F16,Q4_K_M,Q5_K_M,more |
|
UPLOAD_MODE |
Change based on use case | default new_repo |
|
TARGET_REPO |
Only if new_repo mode |
Same as REPO_ID |
|
OUTPUT_DIR |
Only if local_only mode |
./output |
|
BASE_MODEL_TOKENIZER |
β Optional | Only if auto-detect fails | empty = auto |
OUTPUT_PATTERN |
β Optional | Only if custom naming | {model_name}-{quant}.gguf |
LOCAL_CLEANUP_HOURS |
β Optional | Only for local_only |
default 24hour |
TZ |
β Optional | Change to your timezone | UTC |
Always Change:
- β
HUGGINGFACE_TOKENβ Your personal token - β
REPO_IDβ Model to convert
Usually Change:
β οΈ CHECK_INTERVALβ Frequency (or 0 for one-time)β οΈ QUANT_TYPESβ Formats you needβ οΈ UPLOAD_MODEβ Based on use case
Change Only If Needed:
- β
TARGET_REPOβ If usingnew_repomode - β
OUTPUT_DIRβ If usinglocal_onlymode - β
BASE_MODEL_TOKENIZERβ If auto-detect fails - β
OUTPUT_PATTERNβ If custom naming wanted - β
LOCAL_CLEANUP_HOURSβ If different cleanup time - β
TZβ Your timezone (up to you)
Never Change (Leave Default):
- β Comments (helpful documentation)
- β Commented-out options (for reference)
Starting running
docker compose up --build -d
Monitor logs & stop
docker compose logs -f
# docker compose down
| Format | Precision | Size Reduction | Use Case |
|---|---|---|---|
| F32 | Full (32-bit) | None | Maximum precision |
| F16 | Half (16-bit) | ~50% | High quality general use |
| BF16 | Brain Float 16 | ~50% | Training-optimized |
| Q8_0 | 8-bit | ~75% | Near-lossless compression |
| Q6_K | 6-bit | ~80% | High quality compression |
| Q5_K_M | 5-bit | ~83% | Recommended balance |
| Q4_K_M | 4-bit | ~87% | Popular for production |
| Q3_K_M | 3-bit | ~90% | Aggressive compression |
| Q2_K | 2-bit | ~93% | Maximum compression |
This project is licensed under the MIT License - see the LICENSE file for details.