Skip to content

arcxteam/gguf-convert-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GGUF LLMs Converter for Huggingface Hub Models with Multiple Quantizations (GGUF-Format)

Automated conversion of any Huggingface model to multiple GGUF LLMs quantization formats
Supports continuous monitoring, auto-detection, and universal deployment modes

Version GGUF llama.cpp Huggingface License


πŸ“– Overview

Universal GGUF LLMs Converter is a production-ready, Docker-based solution for automatically converting HuggingFace models to GGUF format with multiple quantization types. Built with llama.cpp integration and intelligent tokenizer detection, this tool streamlines the conversion workflow for both personal and community models.

Key Features

  • πŸ”„ Continuous Monitoring: Automatically detects and converts new model updates from HuggingFace repositories
  • πŸ€– Auto-Detection: Intelligent tokenizer detection for 50+ popular model architectures (Qwen, Llama, Mistral, Phi, Gemma, etc.)
  • πŸ“¦ Multiple Quantization: Supports F16, F32, BF16, and all K-quant formats (Q2_K to Q8_0)
  • 🎯 Flexible Deploy: Three (3) upload modes - same repository, new repository, or local-only storage
  • 🧹 Smart Cleanup: Automatic temporary file management to prevent storage used
  • 🐳 Docker: Fully container with optimized build times and resource usage
  • πŸ“Š Progress Tracking: Clean, milestone-based logging with colorized console output

πŸ› οΈ Requirements

VPS Linux Docker HuggingFace

System Requirements:

  • Linux-based VPS or local machine
  • Docker & Docker Compose installed
  • HuggingFace account with WRITE access token
  • Sufficient disk space for model downloads and conversion (varies by model size)

πŸ“ Project Structure

gguf-convert-model/
β”œβ”€β”€ .env
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ scripts/
β”‚   └── start.sh
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ config.py
β”‚   └── utils/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ logger.py
β”‚       └── helpers.py
└── logs/ (auto-created)

πŸš€ Quick Start

1. Prerequisites

HuggingFace Access Token:

Install Docker & Compose if not already installed

Instal docker is optional, if you don't have.. try securely

curl -sSL https://raw.githubusercontent.com/arcxteam/succinct-prover/refs/heads/main/docker.sh | sudo bash

2. Clone Repository

git clone https://github.com/arcxteam/gguf-convert-model.git
cd gguf-convert-model

3. Configure Environment

Create edit & save configuration file

cp .env.example .env
nano .env

Example config environment variable

# HF token with WRITE permission
HUGGINGFACE_TOKEN=hf_xxxxxxxx

# Source model repository to convert
+ Example: Qwen/Qwen3-0.6B
REPO_ID=username/model-name

# Use interval in secs
+ Default 0 = only one-time convert, for other commits setup more)
CHECK_INTERVAL=0

# Output formats (comma-separated, no spaces)
# Available: F16,BF16,F32,Q2_K,Q2_K_S,Q3_K_S,Q3_K_M,Q3_K_L,Q4_K_S,Q4_K_M,Q4_K_L,Q5_K_S,Q5_K_M,Q5_K_L,Q6_K,Q8_0
+ Recommended: F16,Q4_K_M,Q5_K_M,Q6_K
QUANT_TYPES=F16,Q3_K_M,Q4_K_M,Q5_K_M,Q6_K

# ========================================
# UPLOAD MODE - Choose ONE option below
# ========================================

# OPTION 1: same_repo
# Upload to the same repository as own source model
+ Use this only YOUR OWN models with WRITE access
UPLOAD_MODE=same_repo

# OPTION 2: new_repo
# TARGET_REPO will be auto-generated as: username/ModelName-GGUF
+ Leave TARGET_REPO empty for auto (recommended)
+ Or manually specify: TARGET_REPO=your-username/custom-name-GGUF
UPLOAD_MODE=new_repo
TARGET_REPO=

# OPTION 3: local_only
+ Save to local directory only (no upload hugging)
+ Files auto-delete after LOCAL_CLEANUP_HOURS
UPLOAD_MODE=local_only
OUTPUT_DIR=./output

# Only set if auto-detection fails (default)
+ Example: Qwen/Qwen3-0.6B
BASE_MODEL_TOKENIZER=

# Output filename pattern (default)
# Placeholders: {model_name} = extracted base name, {quant} = format type
+ Result example: Qwen3-0.6B-Instruct-Q4_K_M.gguf
OUTPUT_PATTERN={model_name}-{quant}.gguf

# Auto-cleanup hours (default)
+ Setup you need local_only mode
LOCAL_CLEANUP_HOURS=24

# Timezone
TZ=Asia/Singapore

πŸ“Š Configuration Reference

ENV Variable Required? When to Change Default if Empty
HUGGINGFACE_TOKEN βœ… Yes Always (your token) ERROR
REPO_ID βœ… Yes Always (source model) ERROR
CHECK_INTERVAL ⚠️ Optional Default= 0 or Changes in secs 3600=1h
QUANT_TYPES ⚠️ Optional Change formats needed F16,Q4_K_M,Q5_K_M,more
UPLOAD_MODE ⚠️ Optional Change based on use case default new_repo
TARGET_REPO ⚠️ Conditional Only if new_repo mode Same as REPO_ID
OUTPUT_DIR ⚠️ Conditional Only if local_only mode ./output
BASE_MODEL_TOKENIZER ❌ Optional Only if auto-detect fails empty = auto
OUTPUT_PATTERN ❌ Optional Only if custom naming {model_name}-{quant}.gguf
LOCAL_CLEANUP_HOURS ❌ Optional Only for local_only default 24hour
TZ ❌ Optional Change to your timezone UTC

βœ… Checklist - What to Change

Always Change:

  • βœ… HUGGINGFACE_TOKEN β†’ Your personal token
  • βœ… REPO_ID β†’ Model to convert

Usually Change:

  • ⚠️ CHECK_INTERVAL β†’ Frequency (or 0 for one-time)
  • ⚠️ QUANT_TYPES β†’ Formats you need
  • ⚠️ UPLOAD_MODE β†’ Based on use case

Change Only If Needed:

  • ❌ TARGET_REPO β†’ If using new_repo mode
  • ❌ OUTPUT_DIR β†’ If using local_only mode
  • ❌ BASE_MODEL_TOKENIZER β†’ If auto-detect fails
  • ❌ OUTPUT_PATTERN β†’ If custom naming wanted
  • ❌ LOCAL_CLEANUP_HOURS β†’ If different cleanup time
  • ❌ TZ β†’ Your timezone (up to you)

Never Change (Leave Default):

  • βœ… Comments (helpful documentation)
  • βœ… Commented-out options (for reference)

3.πŸƒ Build and Start

Starting running

docker compose up --build -d

Monitor logs & stop

docker compose logs -f
# docker compose down

πŸ“Š Supported Quantization Formats

Format Precision Size Reduction Use Case
F32 Full (32-bit) None Maximum precision
F16 Half (16-bit) ~50% High quality general use
BF16 Brain Float 16 ~50% Training-optimized
Q8_0 8-bit ~75% Near-lossless compression
Q6_K 6-bit ~80% High quality compression
Q5_K_M 5-bit ~83% Recommended balance
Q4_K_M 4-bit ~87% Popular for production
Q3_K_M 3-bit ~90% Aggressive compression
Q2_K 2-bit ~93% Maximum compression

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.