The purpose of the project is to create the high-quality translation database between Finnish and English. The primary idea is to develop a linguistic database that can be used for training artificial intelligence, but all other kinds of use are also allowed and encouraged. The project requires a significant amount of time, and creating and reviewing the first version took several weeks.
If you like to help, please let me know.
- Hugginface: https://huggingface.co/datasets/EkBass/fin-eng-dataset
- Github: https://github.com/EkBass/fin-eng-translations-set/
Current numbers:
- 20k finnish lines, sentences, paragraphs etc. translated to english.
- 36330 unique finnish words.
- Around 35k different words due there is few names in sentences that are not "words" as we see them.
- 3 weeks of translating, figuring out proper sentences and copy/pasting from several places.
Sources:
- Around 9k sentences and paragraphs are done by me and translated by software. All translations manually checked by me.
- A considerable amount of IT vocabulary. https://github.com/TimoSalomaki/IT-sanasto/tree/master
- Public Domain book "Avoin Elämä". https://avoinelama.fi/
- Sexual vocabulary. https://www.hyvakysymys.fi/artikkeli/seksuaalisuuden-sanakirja/
- Modern finnish word list. https://www.kotus.fi/aineistot/sana-aineistot/nykysuomen_sanalista
- Wikipedia articles.
- Lots of finnish laws.
- Laws mostly translated with translator, i dont know right words for them.
Kristian Virtanen, 2024
krisu.virtanen@gmail.com
See more "license.md"