Skip to content

A Comparison of LSTM and BERT Based Models in Turkish Keyword Extraction

Notifications You must be signed in to change notification settings

huseink/Turkish-Keyword-Extraction

Repository files navigation

Abstract

Nowadays, text-based data on the internet is increasing very rapidly and it is an important need to reach the right content that contains the desired information from this big data. Knowing the keywords of the content can provide a positive effect in meeting this need. In this study, it is aimed to determine the keywords representing Turkish texts with natural language processing and deep learning models. Turkish Labeled Text Corpus and Text Summarization Keyword Extraction Data Set were used together as data set. Two different deep learning models were presented in this study. Firstly, Sequence-to-Sequence (Seq2Seq) Model with Long Short-Term Memory (LSTM) layers is designed. The other model is a Seq2Seq model with BERT (Bidirectional Encoder Representations from Transformers). In the evaluation of success of the LSTM layered Seq2seq model, an F-1 score of 0.38 was achieved in the ROUGE-1 criterion. In the BERT based Seq2Seq model, an F-1 value of 0.399 was obtained in the ROUGE-1 criterion. As a result, it has been observed that the BERT based Seq2Seq model based on the Transformer architecture is more successful than the LSTM based Seq2Seq model.

Keywords:

Keyword Extaction, Deep Learning, Seq2seq architecture, Transformer architecture

Türkçe Anahtar Sözcük Çıkarımında LSTM ve BERT Tabanlı Modellerin Karşılaştırılması

About

A Comparison of LSTM and BERT Based Models in Turkish Keyword Extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published