PhoneViz: phone alignment visualiser

PhoneViz (see our paper) is a phone alignment visualiser which facilitates a deeper analysis of the phone alignments typically used to compare a reference transcription and a concrete speaker pronunciation. PhoneViz provides an interactive environment where aligned phones are displayed in the IPA chart helping users to explore phonetic variation beyond symbol substitution, insertion and deletion. PhoneViz was demonstrated at Interspeech 2024 (see the slides). This repository contains the code and resources for the phone alignment visualiser: phone recognition, alignment computation and phones visualiser.

Installation

The most recent version of this code can be cloned from this repository using the command:

git clone https://github.com/MargotUCD/PhoneViz.git

Once the repository has been cloned, you'll find the source code in source. The resources needed to compute the alignments and run the visualiser can be found in resources.

Requirements

Python

Python 3.8 and pip should be installed. You can install all the required python packages using this command:

pip install -r requirements.txt

eSpeak

PhoneViz is proposed with Wav2Vec2Phoneme¹ as phoneme recogniser. eSpeak is required for running Wav2Vec2Phoneme. Follow the instructions below for installation.

Linux: The module is tested with eSpeak version 1.48.15. Install using the command below or specific instruction for a given distribution:

sudo apt-get install espeak=1.48.15

Make sure it is installed using the command espeak --version. You may alternately install eSpeak-NG (https://github.com/espeak-ng/espeak-ng).

Windows: On Windows, it is recommended to install eSpeak-NG from the source (https://github.com/espeak-ng/espeak-ng/releases). It is confirmed to work with version 1.51.

Sometimes Windows can have trouble finding the path to eSpeak dll. The expected path is set to be C:\Program Files\eSpeak NG\libespeak-ng.dll, however, if your installation path is different, it needs to be updated when calling the PhoneViz demonstration pipeline as shown below:

    pipeline_obj = ShowTellPipeline("your_path_to_espeak_dll")

SCTK sclite

The phonetic alignment in PhoneViz is done using sclite from SCTK, the NIST Scoring Toolkit (under licence http://www.nist.gov/open/license.cfm). It does not need to be downloaded as it comes with PhoneViz under the resources\sctk-2.4.10\ repository.

Linux: On Linux, the user needs to provide execution permissions to the sclite program in resources\sctk-2.4.10\bin using the command below.

chmod +x sclite

Windows: To be able to run sclite on Windows, the Windows Subsystem for Linux (WSL) needs to be installed and setup. For documentation on how to install WSL, please refer to the Microsoft help page https://learn.microsoft.com/en-us/windows/wsl/install. Please note that Docker should be installed prior to WSL. You can check WSL installation by running the bash --version command in PowerShell or Windows Command Prompt. As for Linux, the user then needs to give permissions to sclite using the command below.

chmod +x sclite

Key features and usage

The source code is in source. The resources needed to compute the alignments and run the visualiser are in resources. The data you want to run PhoneViz on should be in data.

Utils

The class Text is used for English grapheme to IPA / ARPABET phoneme conversion.

Phone recognition and alignment

Phone recognition can be performed using the methods in the PhoneticTranscription class. The subclass Wav2Vec2Phoneme allows to run wav2vec2.0 for phone recognition, using the recognize_phones(input_path) method. Alignment, under the class SCLiteAlignment allows to compute the alignment between reference and hypothesis text and phonemes. The method get_confusions(hyp, ref) computes the alignment between the reference and the hypothesis and returns the aligned reference and aligned hypothesis in Series format.

Running the whole pipeline and PhoneViz display

The class ShowTellPipeline allows for running the whole [audio files -> Phone recognition -> alignment] pipeline. The class PhoneViz allows to display the generated alignments in PhoneViz.

Example usage below and in the Python file example_use.py.

Simple run using `example_use.py`

python example_use.py

Then you can choose one of the three modes. Example configurations are:

Mode 0: EBVS_arctic_a0001.wav || Author of the danger trail, Philip Steels, etc.
Mode 1: demonstration_dataset_without_alignments.csv
Mode 2: demonstration_dataset_with_alignments.csv

Example use for one audio (mode 0)

The path to the audio file should be relative to the data\\utterances folder.

    pipeline_obj = ShowTellPipeline()

    philip_stilz_file_name = "EBVS_arctic_a0001.wav"
    philip_stilz_ref_text = "Author of the danger trail, Philip Steels, etc."

To get the full alignment and display it in PhoneViz:

    PhoneViz().phoneviz(pipeline_obj.single_pipeline(philip_stilz_file_name,philip_stilz_ref_text))

Example use for a dataframe without alignments

The dataframe should contain only the columns "file_name" and "reference_text".

    project_path = (os.path.dirname(os.path.abspath("__file__"))).replace("source", "")
    test_df = pd.read_csv(project_path+"data\\alignments\\l2arctic_head_without.csv")

To get the full alignments and display it in PhoneViz:

    PhoneViz().phoneviz(pipeline_obj.dataframe_pipeline(test_df))

Example use for a dataframe without alignments

The dataframe should contain the columns "file_name" : the relative path to the audio file, "reference_text" : the reference text in string format, "reference_phonemes" : the reference phones as a word by word list in string format, "hypothesis_phonemes" : the hypothesis phones as a list in string format, "ref_pho_align" : the aligned reference phones as a phone by phone list in string format, "hyp_pho_align" : the aligned hypothesis phones as a phone by phone list in string format. For reference, see demonstration_dataset_with_alignments.csv

    project_path = (os.path.dirname(os.path.abspath("__file__"))).replace("source", "")
    test_df = pd.read_csv(project_path+"data\\alignments\\"+l2arctic_head_with.csv")

To get the full alignments and display it in PhoneViz:

    PhoneViz().phoneviz(test_df)

Acknowledgements

This work was conducted with the financial support of the Science Foundation Ireland (SFI) Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224 and the ADAPT SFI Research Centre under Grant Agreement No. 13/RC/2106 P2 at University College Dublin.

All the example audio files come from L2Arctic ².

Reference

Please cite the following paper if you use code in your work.

@inproceedings{masson24_interspeech,
  title     = {PhoneViz: exploring alignments at a glance},
  author    = {Margot Masson and Erfan A. Shams and Iona Gessinger and Julie Carson-Berndsen},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {3648--3649},
  issn      = {2958-1796},
  url       = {https://www.isca-archive.org/interspeech_2024/masson24_interspeech.pdf}
}

Xu, Q., Baevski, A., Auli, M. (2022) Simple and Effective Zero-shot Cross-lingual Phoneme Recognition. Proc. Interspeech 2022, 2113-2117, (https://doi.org/10.21437/Interspeech.2022-60) ↩
Zhao, G., Sonsaat, S., Silpachai, A., Lucic, I., Chukharev-Hudilainen, E., Levis, J., Gutierrez-Osuna, R. (2018) L2-ARCTIC: A Non-native English Speech Corpus. Proc. Interspeech 2018, 2783-2787, (https://doi.org/10.21437/Interspeech.2018-1110) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
resources		resources
source		source
.gitignore		.gitignore
README.md		README.md
example_run.gif		example_run.gif
phoneviz_screenshot.png		phoneviz_screenshot.png
requirements.txt		requirements.txt
slides_show_tell_masson_shams_gessinger_berndsen.pdf		slides_show_tell_masson_shams_gessinger_berndsen.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhoneViz: phone alignment visualiser

Installation

Requirements

Python

eSpeak

SCTK sclite

Key features and usage

Utils

Phone recognition and alignment

Running the whole pipeline and PhoneViz display

Simple run using `example_use.py`

Example use for one audio (mode 0)

Example use for a dataframe without alignments

Example use for a dataframe without alignments

Acknowledgements

Reference

About

Releases 1

Packages

Contributors 2

Languages

MargotUCD/PhoneViz

Folders and files

Latest commit

History

Repository files navigation

PhoneViz: phone alignment visualiser

Installation

Requirements

Python

eSpeak

SCTK sclite

Key features and usage

Utils

Phone recognition and alignment

Running the whole pipeline and PhoneViz display

Simple run using example_use.py

Example use for one audio (mode 0)

Example use for a dataframe without alignments

Example use for a dataframe without alignments

Acknowledgements

Reference

Footnotes

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Simple run using `example_use.py`

Packages