A subtitle translator powered by Open-Source Artificial Intelligence models.
- Overview
- Features
- Project Structure
- Getting Started
- Testing
- Important Note
- Project Roadmap
- Contributing
- License
- Acknowledgments
Sucata is an open-source tool designed to extract and translate subtitles from .mkv
files, as well as process .srt
, .ass
, and .ssa
subtitle files directly. Featuring a user-friendly interface, Sucata leverages AI language models, such as Llama, to deliver high-quality translations while preserving the context and style of the original dialogues.
-
Subtitle Extraction:
- Compatible with subtitle tracks in MKV files.
- Track selection via GUI for enhanced usability.
-
Intelligent Translation:
- Support for
.srt
,.ass
, and.ssa
files. - Contextual adaptation of slang, cultural expressions, and emotional tones.
- Support for
-
Graphical User Interface:
- Built with
Tkinter
, offering simplicity and accessibility.
- Built with
-
Multi-Language Support:
- Supports multiple languages, including Arabic, Bengali, English, French, German, Hindi, Indonesian, Japanese, Korean, Mandarin Chinese, Marathi, Portuguese, Brazilian Portuguese, Russian, Spanish, Tamil, Telugu, Turkish, Urdu, Vietnamese, and Western Punjabi.
└── sucata/
├── app.py # Main project
├── fonts/ # Utilized fonts
│ ├── FKGroteskNeueTrial-Bold.otf
│ ├── FKGroteskNeueTrial-Regular.otf
│ └── Horizon.otf
├── img/ # Project images
│ ├── sucata_hello.png
│ ├── sucata_icon.ico
│ ├── sucata_preview.jpeg
│ └── kofi_pt-BR.png
├── requirements.txt # Project dependencies
└── README.md # Universal Readme
└── README-pt-BR.md # Portuguese Brazilian Readme
- Python: Requires Python 3.9 or later.
- Pip: Python's package manager.
- External Tools:
mkvextract
andmkvmerge
for MKV file handling.
- A Hugging Face account (optional): Needed to access certain AI models like Llama.
-
Clone the Repository:
git clone https://github.com/pedronalis/sucata.git
-
Navigate to the Directory:
cd sucata
-
Install Dependencies:
pip install -r requirements.txt
-
Configure the Model (Optional):
- To use Llama or Qwen Models, request access from Hugging Face.
- If you don’t have access, use an alternative open-source model.
- Run the program:
python app.py
- In the interface:
- Select an MKV or subtitle file (.srt/.ass/.ssa).
- Choose the source language.
- Click Start Translation and monitor the progress in the log.
- Basic:
- Use a small
.srt
file to test translations.
- Use a small
- Advanced:
- Test with MKV files containing multiple subtitle tracks.
This program does not replace a professional translator and does not guarantee 100% perfect translations. While it uses advanced Artificial Intelligence models, errors or contextual inaccuracies may occur in some translations.
To improve results, you can customize the prompt inside the app.py
file to fit your desired language and style. This can help the AI produce translations better tailored to your needs.
- Initial Release: Subtitle translation and track extraction.
- Implement support for additional languages.
- Add support for batch translations.
- Improve model optimization with fine-tuning.
- Fork the Repository and make your improvements!
- Submit a Pull Request with your changes.
- Report Bugs or suggest enhancements.
This project is licensed under the GNU AGPL.
Feel free to contribute! ❤️
- Hugging Face: For supporting open-source models.
- Tkinter Community: For accessible documentation.
- All contributors and testers helping to improve the project.