LlavaGuard

LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment

This is the official repository for LlavaGuard, a versatile framework for evaluating visual content safety compliance. LlavaGuard is designed for both dataset annotation and safeguarding generative models.

📄 Project Page
🤗 Hugging Face Models

Model Repositories

Overview

LlavaGuard comes with an open pipline for building safety datasets as well as open pre-trained weights for vision safeguarding. The models can be used for:

Direct inference (via SGLang and Hugging Face Transformers)
Fine-tuning via LoRA
Full model training

Installation

For Inference

# Option 1: Using SGLang
git clone https://github.com/sgl-project/sglang
cd sglang
docker build -f docker/Dockerfile -t sglang .

# Option 2: Using Transformers
pip install transformers torch

Usage

Inference

We provide two inference options:

Via SGLang
- See example scripts in scripts/inference/sglang.ipynb
- Requires SGLang installation
Via Transformers
- See example scripts in scripts/inference/transformers.ipynb
- Uses standard Hugging Face pipeline

The model outputs include:

Safety rating ("Safe" or "Unsafe")
Safety category classification
Detailed rationale for the assessment

Generating LlavaGuard dataset

We offer a pipeline to create new datasets based on a specified version. These versions are defined in llavaguard_config.py. You can also generate custom datasets by extending this file.

To build new datasets you have to define local paths and dataset/model configurations in llavaguard_config.py.
To generate rationale for new datasets, see scripts/data/generate_rationales.sh.
To build new datasets, see scripts/data/prepare_datasets.sh.

Training LlavaGuard

Example scripts for training LlavaGuard are available in the scripts/train directory. Note that you will need to install additional dependencies based on the model you wish to tune:

LlavaGuard: LLaVA LlavaGuard-OV: LLaVA-NeXT QwenGuard: LLaMA-Factory

Evaluating LlavaGuard

A script for evaluating LlavaGuard is provided in scripts/eval.sh. The evaluation supports different deployment options which you can define using the engine. We recommend using SGLang for deployment. Depending on your chosen engine, you may need to install the corresponding dependencies:

SGLang: SGLang VLLM: VLLM LMdeploy: LMdeploy

Safety Taxonomy

Our different taxonomies and augmentation techniques can be found in llavaguard/taxonomy.

Methodology

This paper introduces Llavaguard, a suite of VLM-based vision safeguards that address the critical need for reliable tools in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting Llavaguard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, Llavaguard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate Llavaguard's performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework publicly available, including the dataset and model weights.

Citation

If you use LlavaGuard for your research, please cite our paper:

@incollection{helff2024llavaguard, 
            crossref = { https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html }, 
            key = { Best Runner-Up Paper Award at NeurIPS RBFM 2024 }, 
            booktitle = { Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops and Working Notes of the NeurIPS 2024 Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models (RBFM) }, 
            year = { 2024 }, 
            author = { Lukas Helff and Felix Friedrich and Manuel Brack and Patrick Schramowski and Kristian Kersting }, 
            title = { LLAVAGUARD: VLM-based Safeguard for Vision Dataset Curation and Safety Assessment }
}

This repository aims to facilitate research and development in visual content safety. For any questions, suggestions, or issues, please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
figs		figs
llavaguard		llavaguard
scripts		scripts
.gitconfig		.gitconfig
.gitignore		.gitignore
README.md		README.md
llavaguard_config.py		llavaguard_config.py
prototyping_hessianAI.ipynb		prototyping_hessianAI.ipynb
train_llava.py		train_llava.py
train_llava_ov.py		train_llava_ov.py
train_utils.py		train_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlavaGuard

Model Repositories

Table of Contents

Overview

Installation

For Inference

Usage

Inference

Generating LlavaGuard dataset

Training LlavaGuard

Evaluating LlavaGuard

Safety Taxonomy

Methodology

Citation

About

Releases

Packages

Contributors 3

Languages

ml-research/LlavaGuard

Folders and files

Latest commit

History

Repository files navigation

LlavaGuard

Model Repositories

Table of Contents

Overview

Installation

For Inference

Usage

Inference

Generating LlavaGuard dataset

Training LlavaGuard

Evaluating LlavaGuard

Safety Taxonomy

Methodology

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages