Abstract
Nowadays, text-based data on the internet is increasing very rapidly and it is an important need to reach the right content that contains the desired information from this big data. Knowing the keywords of the content can provide a positive effect in meeting this need. In this study, it is aimed to determine the keywords representing Turkish texts with natural language processing and deep learning models. Turkish Labeled Text Corpus and Text Summarization Keyword Extraction Data Set were used together as data set. Two different deep learning models were presented in this study. Firstly, Sequence-to-Sequence (Seq2Seq) Model with Long Short-Term Memory (LSTM) layers is designed. The other model is a Seq2Seq model with BERT (Bidirectional Encoder Representations from Transformers). In the evaluation of success of the LSTM layered Seq2seq model, an F-1 score of 0.38 was achieved in the ROUGE-1 criterion. In the BERT based Seq2Seq model, an F-1 value of 0.399 was obtained in the ROUGE-1 criterion. As a result, it has been observed that the BERT based Seq2Seq model based on the Transformer architecture is more successful than the LSTM based Seq2Seq model.
Keywords:
Keyword Extaction, Deep Learning, Seq2seq architecture, Transformer architecture
Türkçe Anahtar Sözcük Çıkarımında LSTM ve BERT Tabanlı Modellerin Karşılaştırılması