Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Fakhraddin Alwajih, El Moatez Billah Nagoudi, Gagan Bhatia, Abdelrahman Mohamed, Muhammad Abdul-Mageed

The University of British Columbia, Invertible AI

🔥 Details will be released. Stay tuned 🍻 👍

If you find this work useful for your research, please kindly cite our paper and star our repository.

Updates

[09/12/2014] We've released Peacock weights: Model
[09/12/2014] We've released Henna benchmark and evaluation datasets: Henna & Eval
[15/05/2024] Peacock has been accepted at the ACL2024 main conference.
[01/03/2024] ArXiv paper released.

Abstract

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, including even those with large speaker populations such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed Peacock, with strong vision and language capabilities. Through comprehensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce Henna, a new benchmark specifically designed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally-aware Arabic MLLMs.

Henna Benchmark

This collection of images showcases a curated subset selected from Henna dataset, representing 11 Arab countries, and capturing the essence of traditional food, local customs, historical monuments, everyday activities, and distinctive architecture that characterize the diverse and rich heritage of each region.

Henna Dataset Generation

Dataset Generation Example using GPT-4V. This figure demonstrates the process of generating a question-answer dataset for an attraction in Yemen as an example. For each site, an image and its corresponding Wikipedia article were used to provide GPT-4V with rich contextual information. The model then generated ten contextually relevant questions and answers per image.

Evaluation results

Comparison between the performance of Peacock models on SEED-Benchmark dimensions.

Examples

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{alwajih-etal-2024-peacock,
    title = "Peacock: A Family of {A}rabic Multimodal Large Language Models and Benchmarks",
    author = "Alwajih, Fakhraddin  and
      Nagoudi, El Moatez Billah  and
      Bhatia, Gagan  and
      Mohamed, Abdelrahman  and
      Abdul-Mageed, Muhammad",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.689",
    doi = "10.18653/v1/2024.acl-long.689",
    pages = "12753--12776"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

If you find this work useful for your research, please kindly cite our paper and star our repository.

Updates

Abstract

Henna Benchmark

Henna Dataset Generation

Evaluation results

Examples

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

If you find this work useful for your research, please kindly cite our paper and star our repository.

Updates

Abstract

Henna Benchmark

Henna Dataset Generation

Evaluation results

Examples

Citation