These resources are made available under the terms of Creative Commons Attribution 4.0 International Public License (CC BY 4.0) (which you can find at https://creativecommons.org/licenses/by/4.0/), and are distributed without any warranty.
Information and contact: Xavier Gómez Guinovart (xgg2021@gmail.com)
File | Size | Description |
---|---|---|
SLI_CLUVI_EN_GL_TMX_3.4 | 19.7M | Engish-Galician sentence-level translation pairs (291,633 translated sentences) from the CLUVI Corpus (https://ilg.usc.gal/cluvi/) in XML TMX (Translation Memory eXchange) format |
SLI_CLUVI_ES_GL_TMX_3.4 | 30.7M | Spanish-Galician sentence-level translation pairs (318,863 translated sentences) from the CLUVI Corpus (https://ilg.usc.gal/cluvi/) in XML TMX (Translation Memory eXchange) format |
SLI_CLUVI_LEGA_TMX_2.1.tar.gz | 9.8M | LEGA Parallel Corpus of Galician-Spanish legal texts (6,582,415 words) at version 2.1 (https://ilg.usc.gal/cluvi/) in XML TMX (Translation Memory eXchange) format |
SLI_NERC_Galician_Gold_CoNLL.1.0.tar.gz | 460K | SLI NERC Galician Gold Corpus encoded in CoNLL format for machine learning in tasks of Named Entity Recognition and Classification |
SLI_NERC_Galician_Gold_FreeLing.1.0.tar.gz | 1.7M | SLI NERC Galician Gold Corpus encoded in FreeLing format for machine learning in tasks of Named Entity Recognition and Classification |
SLI_CTG_POS.1.1.tar.gz | 2.9M | CTG Galician Technical Corpus (http://ilg.usc.gal/ctg/) tagged with POS for machine learning and used for training by the IXA pipes tools (http://ixa2.si.ehu.es/ixa-pipes) |
SLI_CTG_Lemma.1.0.tar.gz | 3.2M | CTG Galician Technical Corpus (http://ilg.usc.gal/ctg/) lemmatised for machine learning and used for training by the IXA pipes tools (http://ixa2.si.ehu.es/ixa-pipes) |
SLI_CTG_POS_Lemma.1.1.tar.gz | 3.9M | CTG Galician Technical Corpus (http://ilg.usc.gal/ctg/) tagged with POS and lemmas for machine learning |
SLI_GalWeb.1.0.tar.gz | 302M | SLI GalWeb Corpus is a large corpus for Galician (174.630.824 words) compiled by the SLI from various domains by crawling for machine learning and used for training by the IXA pipes tools (http://ixa2.si.ehu.es/ixa-pipes) |
SLI_CLUVI_ES_GL_TMX_3.4 has been split into 25MB parts with the split command:
$ split -b 25M -d SLI_CLUVI_ES_GL_TMX_3.4.tar.gz SLI_CLUVI_ES_GL_TMX_3.4.tar.gz.part
To rejoin these parts after download, you can use the cat command:
$ cat SLI_CLUVI_ES_GL_TMX_3.4.tar.gz.part* > SLI_CLUVI_ES_GL_TMX_3.4.tar.gz
SLI_GalWeb.1.0.tar.gz has been split into 25MB parts with the split command:
$ split -b 25M -d SLI_GalWeb.1.0.tar.gz SLI_GalWeb.1.0.tar.gz.part
To rejoin these parts, you can use the cat command:
$ cat SLI_GalWeb.1.0.tar.gz.part* > SLI_GalWeb.1.0.tar.gz