Deep VSR (speechreading) with Face Inputs

Description

We provide code and models (in PyTorch) that can be used to evaluate the methods in our paper Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition. We provide models trained on the LRW dataset (English) and the LRW-1000 dataset (Mandarin Chinese).

Model Zoo

Coming soon...

Citation

@inproceedings{zhang2020can,
    author = {Y. Zhang and S. Yang and J. Xiao and S. Shan and X. Chen},
    booktitle = {2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG)},
    title = {Can We Read Speech Beyond the Lips? {R}ethinking {RoI} Selection for Deep Visual Speech Recognition},
    year = {2020},
    pages = {851-858},
    keywords = {visual speech recognition},
    doi = {10.1109/FG47880.2020.00134},
    url = {https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00134},
    publisher = {IEEE Computer Society},
    address = {Los Alamitos, CA, USA}
}

License

TBD

Contact

Yuanhang Zhang

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval_lrw1000		eval_lrw1000
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep VSR (speechreading) with Face Inputs

Description

Content

Model Zoo

Citation

License

Contact

About

Releases 1

Languages

VIPL-Audio-Visual-Speech-Understanding/deep-face-speechreading

Folders and files

Latest commit

History

Repository files navigation

Deep VSR (speechreading) with Face Inputs

Description

Content

Model Zoo

Citation

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages