Word embeddings are some of the fundamental units of natural language processing algorithms, used to represent words mathematically by considering semantic and syntactic similarities in the context in which they occur. This paper describes the process of generating the first set of word embeddings models in portuguese for the specific domain of oil and gas. A textual dataset (corpus) was composed from several data sources published by reference institutions in this field. The generated models are qualitatively evaluated in their ability to represent technical terms in the O&G domain. We describe each step, since pre-processing, training and the results obtained in the qualitative analysis. Finally, the scripts, corpus and algorithms used in the study, as well as the generated models, are made available for public use.
forked from fabiocorreacordeiro/wordEmbeddingsOG
-
Notifications
You must be signed in to change notification settings - Fork 0
22renata/wordEmbeddingsOG
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Portuguese Word Embeddings for the specific domain of Oil and Gas
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- HTML 62.3%
- Jupyter Notebook 37.7%