This repository is the open-source code behind https://trawsgrifiwr.techiaith.cymru, the project wraps the Uned Technolegau Iaith's speech recognition service into a simple UI for creating Welsh language subtitles.
To get a local copy up and running follow these steps.
Docker-compose is now shipped with DockerDesktop, however if the docker compose
plugin is not available in your installation then follow the install instructions.
-
Clone the repo
git clone https://github.com/techiaith/trawsgrifiwr-arlein.git
-
Build docker image
make
-
Before we can visit the local site we need to initialize the database tables, we do this by first setting the server running and then calling the init script. It is important that the database gets a chance to do its initial setup, therefore you should check the logs for the line:
mysqld: ready for connections.
before calling the init script.make run make log make init
-
During the initialization phase, SSL certificates are generated in order to allow the browser to make audio recording when the "record" button is selected. These certificates are unsigned and as such will throw a 'this website is not secure error' which will need to be trusted in order to use the website.
This software was written for the express purpose of running within a debian server environment and as such no guarantees can be made that building will be successful on other operating systems. Further to this, SSL certs are used which will generate warnings when visiting the app via browser. These warnings are unavoidable unless you intend to run this server on a public domain or are familiar with local SSL enabled web application development.
If you do not understand the warnings or the consequences of ignoring said warnings, it is recommended you instead visit https://trawsgrifiwr.techiaith.cymru to try out the software.
To view the app visit https://localhost:6543 and try it out!
You can either enter a link to an online video, that you have posted, upload your content or record your voice directly, and our service will generate transcripts.
From these transcripts you can then adjust segment lengths and correct any mistakes or missing punctuation in the text. After you have edited the transcripts you can save your work as SRT or TextGrid subtitles
Trawsgrifiwr is not yet capable of recognising the entirety of your speech correctly every time, however the following advice can significantly improve the results
- Ensure your microphone is working correctly and captures good quality audio.
- Ensure you speak clearly and try not to mash words together.
- Don't expect English words to be recognised or less formal words such as; rîli, tsips, neith.
- If the Trawsgrifiwr does not recognise your voice well after these steps you can further increase its chances of success by contributing your voice to our project via Mozilla Common Voice
- Todo: Road map
We are currently planning out new features, please check back soon or add your opinion via the issue tracker.
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Techiaith - @techiaith - techiaith@bangor.ac.uk - techiaith.cymru
Project Link: https://github.com/techiaith/trawsgrifiwr-arlein
We thank the Welsh Government for funding this work as part of the Technoleg Cymraeg 2021-22 project.