An application written in Python and making use of open source programs such as Praat for sound analysis and the synthesis backend of VocalTractLab to allow for an advanced control of the vocal tract shapes and glottis properties over time. The initial goals for this system were to allow for producing 'natural' and 'unnatural' sounding vocalizations and articulations, interpolations between mouth and glottis configurations in order to allow for continuous parameter changes of invariants (physical properties) and variants(f0, intensity, filtering coefficients) to go from one ‘speaker identity’ to another. Furthermore, using features from actual voice recordings then became an important way of using the system. Using variant parameters such as f0 & loudness to address prosodic, intonation and jitter and shimmer patterns in the voice recordings.
In broad terms the synthesis procedure follows the source-filter paradigm, where the vocal fold vibration serves as the excitation of the filter. The vocal tract then serves as the filtering of the glottal pulses.
Sound examples can be found in NKOAPP
- First of all download the latest version of the Praat and VocalTractLab software.
- Then fork this repository locally on your computer.
- If you want to use f0 and intensity data from an audio recording, then follow the following steps:
- Open
Praat
> Open > Read from file... and select the audio file you'd like to analyze - Then select this sound in the Object box, go to the toolbar option Praat > Open Praat script...
- Navigate to the folder in directory
NKOAPP\Praat Scripts\
and choose thef0_intensity.txt
file. Then click on Run in the script popup window. - Wait a bit for the script to finish, then save the file as a txt-file with an arbitrary name in the folder
NKOAPP\f0andIntensity
. For example asf0andIntVoxAdam.txt
- done for now. Continue to 4
- Open
- Open
NKOAPP.py
file in your IDE of choice and theVocalTractLab2
software
- First of all you have to choose the glottis model
glChoice = glGeomOptions['GM']
with three choices'GM'
,'2M'
or'TRI'
.- Choose the same glottis model as
glChoice
(see above): Synthesis models > Vocal folds model > Choose one of the three > then click on Use selected model for synthesis
- Choose the same glottis model as
- If you want to use the f0 and intensity data from the recording, you should add the filename in this part of the code
f0andIntFileName = directory + '/f0andIntensity/**filename.txt**'
- If we want to manually generate the tractSequence or use the f0 and intensity values from the praat script
manual = True
go to Section A in the codemanual = False
go to Section B in the code
- Two ways of working with NKOAPP are either through manual input of glottis parameters, vocal tract shapes, and durations of the interpolations between sources and targets. The names of the glottis and vocal tract presets can be found in the
speakerJD2.json
file.
- Option A1 for glottis and tract targets (glOption,trOption) and the durations between them
- Option A2 to generate random interpolations between a set of chosen glottis and tract targets (glOption, trOption) and random durations
- Follow the comments in the code. Don't forget to comment out the parts you don't need when working in another section and another Option
- Same as above, but now the amount of frames in the tractSequence-file amount to the same duration as the audio file. (This can not be changed)
- Change the durations between targets by proportioning the segments in different ways
- make the durations get shorter and shorter starting from the longest part:
durModulationG = arithmetic_progression(1,valT)[::1]
or the interpolations get longer and longer starting from the shortest part:durModulationG = arithmetic_progression(1,valT)[::-1]
- change the exponent of the durations
- Take the time discrete derivative of the splines. The splines are cubic polynomials, so the maximum derivative is
2
. So, three options are available for both the glottis and tract interpolations inderivativeGlot
andderivativeTract
. This is still in its experimental state, use at own risk.- [0,1,2]: 0=no deriv, 1=1st deriv, 2=2nd deriv
- Choose the upper and lower boundary for random uniform number generation as a factor to randomize the glottis parameters (f0, Intensity, Jaw Height, Tongue Height, etc...)
Now build the NKOAPP.py file
- After building, there should be a tractSequence file in the
\TractSequence
folder that should look something like this2024-05-13_02-50-43_TractSequence_ConstPressureAndf0_Geometric glottis_n9_normalDist[1, 1]_[0, 0]_Manual.txt
- Go to
VocalTractLab2
and navigate to Synthesis from file > Tract sequence file to audio. Then find the tractSequence file in the beforementioned folder and select it. - This can take long if the tractSequence file is more than 10s. Ignore the
"The Program is not responding"
, it's just rendering.
by Lawrence McGuire