CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

This repository provides the official resources for the CulturalPangea model and the CulturalGround dataset.

About CulturalGround and CulturalPangea

CulturalGround dataset contains 30M million high-quality(22M open-ended, 8M multiple choices samples), culturally-rich Visual Question Answering (VQA) pairs spanning 42 countries and 39 languages curated from Wikidata.

CulturalPangea is an open-source, multilingual, multimodal large language model (MLLM) specifically fine-tuned to understand culturally significant entities from around the world. It addresses the common issue of MLLMs misinterpreting long-tail cultural entities by directly grounding the model in diverse cultural knowledge. Starting from the powerful Pangea, CulturalPangea is further trained on our new CulturalGround dataset. By training on this data, CulturalPangea achieves state-of-the-art performance among open models on culture-focused benchmarks without degrading performance on mainstream vision-language tasks.

Repository Structure

The repository is organized into the following directories:

data_curation: Contains code for the data curation pipeline. Check the doc for more.
train: Contains scripts and instructions for fine-tuning the CulturalPangea model.
evaluation: Includes code for assessing the model's performance on culture-specific and general multilingual multimodal benchmarks.

Setting Up

To get started with CulturalPangea:

Clone the Repository: Use Git to clone this repository to your local environment.
Install Dependencies: Ensure you have the required dependencies installed. For training and inference, you need to install the LLaVA-NeXT framework:
```
cd train/LLaVA-NeXT
pip install -e ".[train]"
```
For evaluation, you need to install the lmms-eval framework:
```
cd evaluation/lmms-eval
pip install -e .
```
Download the Dataset: The CulturalGround dataset is available on Hugging Face at neulab/CulturalGround. It consists of JSON files for VQA data and TAR archives containing the images for each country.

Data Format

The CulturalGround dataset follows the LLaVA format. Each instance contains a unique ID, an image path, and a series of conversations.

Below is an example of one such data instance:

{
    "id": "...",
    "image": "images/spain/Q5050823_Castro_de_Baroña_y_playa_de_Arealonga.png",
    "conversations": [
        {
            "from": "human",
            "value": "<image>\nWhich culture is this entity associated with?"
        },
        {
            "from": "gpt",
            "value": "The castro is associated with the Castro culture, an Iberian archaeological culture."
        }
    ],
    "language": "en"
}

Data Structure:

id: Unique identifier for the data sample.
image: The path to the image file used in this instance.
conversations: A series of conversations between a "human" and the model ("gpt").
- from: Identifies the speaker ("human" or "gpt").
- value: The content of the message, including text and image tokens.
language: The language of the conversation.

Training

CulturalPangea is created by fine-tuning the pre-trained Pangea-7B model.

Prepare the Data: Ensure the CulturalGround JSON files and the corresponding image folders (extracted from the downloaded TAR archives) are placed in the designated directory as specified in the fine-tuning script. The training split consists of 13M open-ended and 5M multiple-choice questions.
Run the Fine-tuning Script:
```
cd train
./LLaVA-NeXT/scripts/train/finetune_culturalpangea.sh
```
This script fine-tunes the connector and LLM parts of the model while keeping the vision encoder frozen.

Evaluation

To evaluate CulturalPangea's capabilities on benchmarks like CVQA, MARVL:

Navigate to the Evaluation Directory:
```
cd evaluation
```

Run the Evaluation Script:

# Set the model path and task
MODEL_PATH="neulab/CulturalPangea-7B" # Or your local checkpoint path
TASK="marvl" # Example task

python3 -m accelerate.commands.launch \
      --num_processes=8 \
      -m lmms_eval \
      --model llava \
      --model_args pretrained=$MODEL_PATH,conv_template=qwen_1_5 \
      --tasks ${TASK} \
      --batch_size 1 \
      --log_samples \
      --log_samples_suffix ${TASK} \
      --output_path eval_logs

To evaluate other models, replace ${MODEL_PATH} and adjust the --model_args as needed. For detailed instructions and the full list of evaluation tasks, refer to the scripts in the evaluation directory.

Citation

If you use CulturalGround or CulturalPangea in your research, please cite:

@misc{nyandwi2025groundingmultilingualmultimodalllms,
  title={Grounding Multilingual Multimodal LLMs With Cultural Knowledge},
  author={Jean de Dieu Nyandwi and Yueqi Song and Simran Khanuja and Graham Neubig},
  year={2025},
  eprint={2508.07414},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2508.07414}
}

Acknowledgments

We thank the teams behind Pangea and LLaVA-NeXT for providing the foundational models and frameworks that made this work possible. CulturalPangea builds directly upon Pangea's multilingual capabilities and leverages the LLaVA-NeXT training infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_curation		data_curation
docs		docs
evaluation		evaluation
other_evals/almbench_eval		other_evals/almbench_eval
predict		predict
train		train
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

About CulturalGround and CulturalPangea

Repository Structure

Setting Up

Data Format

Data Structure:

Training

Evaluation

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

neulab/CulturalGround

Folders and files

Latest commit

History

Repository files navigation

CulturalGround: Grounding Multilingual Multimodal LLMs With Cultural Knowledge

About CulturalGround and CulturalPangea

Repository Structure

Setting Up

Data Format

Data Structure:

Training

Evaluation

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages