Skip to content

Portuguese Word Embeddings for the specific domain of Oil and Gas

Notifications You must be signed in to change notification settings

22renata/wordEmbeddingsOG

 
 

Repository files navigation

wordEmbeddingsOG

Portuguese Word Embeddings for the specific domain of Oil and Gas

Word embeddings are some of the fundamental units of natural language processing algorithms, used to represent words mathematically by considering semantic and syntactic similarities in the context in which they occur. This paper describes the process of generating the first set of word embeddings models in portuguese for the specific domain of oil and gas. A textual dataset (corpus) was composed from several data sources published by reference institutions in this field. The generated models are qualitatively evaluated in their ability to represent technical terms in the O&G domain. We describe each step, since pre-processing, training and the results obtained in the qualitative analysis. Finally, the scripts, corpus and algorithms used in the study, as well as the generated models, are made available for public use.

About

Portuguese Word Embeddings for the specific domain of Oil and Gas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 62.3%
  • Jupyter Notebook 37.7%