AV-HuBERT (Audio-Visual Hidden Unit BERT)

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

Robust Self-Supervised Audio-Visual Speech Recognition

Introduction

AV-HuBERT is a self-supervised representation learning framework for audio-visual speech. It achieves state-of-the-art results in lip reading, ASR and audio-visual speech recognition on the LRS3 audio-visual speech benchmark.

If you find AV-HuBERT useful in your research, please use the following BibTeX entry for citation.

@article{shi2022avhubert,
    author  = {Bowen Shi and Wei-Ning Hsu and Kushal Lakhotia and Abdelrahman Mohamed},
    title = {Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction},
    journal = {arXiv preprint arXiv:2201.02184}
    year = {2022}
}

@article{shi2022avsr,
    author  = {Bowen Shi and Wei-Ning Hsu and Abdelrahman Mohamed},
    title = {Robust Self-Supervised Audio-Visual Speech Recognition},
    journal = {arXiv preprint arXiv:2201.01763}
    year = {2022}
}

License

AV-HuBERT LICENSE AGREEMENT

This License Agreement (as may be amended in accordance with this License Agreement, “License”), between you (“Licensee” or “you”) and Meta Platforms, Inc. (“Meta” or “we”) applies to your use of any computer program, algorithm, source code, object code, or software that is made available by Meta under this License (“Software”) and any specifications, manuals, documentation, and other written information provided by Meta related to the Software (“Documentation”).

By using the Software, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to use the Software or Documentation (collectively, the “Software Products”), and you must immediately cease using the Software Products.

Pre-trained and fine-tuned models

Please find the checkpoints here

Demo on Colab (Not suitable for Raspberry Pi)

Run our lip-reading demo using Colab:

Installation on Raspberry Pi

First, create a virtual environment and activate it:

sudo apt install python3-venv
python3 -m venv my-project-env
source my-project-env/bin/activate

Then, install Jupyter Notebook:

pip install jupyter
jupyter notebook

Now open avhubert_vsr.ipynb on Jupter Notebook, to execute the code cells in Jupyter Notebook, click on the cell with code in it, then press "Shift+Enter" on your keyboard. This will run the code in that cell.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
fairseq		fairseq
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
avhubert_vsr.ipynb		avhubert_vsr.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AV-HuBERT (Audio-Visual Hidden Unit BERT)

Introduction

License

Pre-trained and fine-tuned models

Demo on Colab (Not suitable for Raspberry Pi)

Installation on Raspberry Pi

About

Releases

Packages

Languages

License

ikram554/AV-HuBERT-for-Raspberry-Pi-4-8GB

Folders and files

Latest commit

History

Repository files navigation

AV-HuBERT (Audio-Visual Hidden Unit BERT)

Introduction

License

Pre-trained and fine-tuned models

Demo on Colab (Not suitable for Raspberry Pi)

Installation on Raspberry Pi

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages