wordEmbeddingsOG

Portuguese Word Embeddings for the specific domain of Oil and Gas

Word embeddings are some of the fundamental units of natural language processing algorithms, used to represent words mathematically by considering semantic and syntactic similarities in the context in which they occur. This paper describes the process of generating the first set of word embeddings models in portuguese for the specific domain of oil and gas. A textual dataset (corpus) was composed from several data sources published by reference institutions in this field. The generated models are qualitatively evaluated in their ability to represent technical terms in the O&G domain. We describe each step, since pre-processing, training and the results obtained in the qualitative analysis. Finally, the scripts, corpus and algorithms used in the study, as well as the generated models, are made available for public use.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
_corpus		_corpus
geracaoEmbeddings		geracaoEmbeddings
preProcessamento		preProcessamento
.gitignore		.gitignore
README.md		README.md
RioOil2018_Word Embedding.pptx		RioOil2018_Word Embedding.pptx
Riooil2018_Artigo_completo.pdf		Riooil2018_Artigo_completo.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wordEmbeddingsOG

Portuguese Word Embeddings for the specific domain of Oil and Gas

About

Releases

Packages

Languages

22renata/wordEmbeddingsOG

Folders and files

Latest commit

History

Repository files navigation

wordEmbeddingsOG

Portuguese Word Embeddings for the specific domain of Oil and Gas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages