An improvment of shallow word embeddings to challenge deep ones! deep learning is not always the solution!
The need to convert human understandable text data into mathematically processable data has been the subject of researchers' studies even before the emergence of concepts such as artificial intelligence and natural language processing. In recent years, this concept has been more widely used and better-known due to the increase in the use of tasks related to natural language processing. However, since natural language processing tasks strongly depend on word embeddings, efforts are always made to provide a better and more efficient word embedding approach.
In this project, we present a method to improve the performance of word embedding. Our proposed method can address some of the problems of the current embedding methods. The provided approach is based on the existing statistical methods by changing the data structure of classical word embeddings. It can act as an improvement on previous embeddings or independently as a method for producing new embeddings.
-
link to the paper(soon)
-
link to the article(soon)
pre-requirements
This project uses Python as the main programming language, however, we used Matlab for some preprocesses, so it's recommended to use the last version of python to avoid errors.
1) Series generation
In this phase, shallow word embeddings such as Word2Vec, GloVe or Fast text convert into series and will be treated as signals in the next step. The "SeriesGeneration" folder contains the code related to this conversion. We used some example embeddings to generate signals however, you can use any word embedding, either shallow or deep.
2) Building 2D structure
In the second phase, we convert signals into the matrix structure using statistical methods. We used two different approaches to do so. You can find them in the "RP" folder. Only one of the RP files can be used on each test.
3) Evaluation As explained in the paper and article, we used several evaluation methods including external and internal evaluations. In the "evaluation" folder, you can file all the methods used to do evaluations.
Zahra Aershia
- GitHub: @ZahraArshia
- LinkedIn: @ZahraArshia
Contributions, issues, and feature requests are welcome!
Feel free to check the issues page.
Give a ⭐️ if you like this project!
We would like to acknowledge MUT NLP lab for their support.
This project is MIT licensed.