This repository contains a brief introduction about feature extraction of text based data.
The textual data is present in resort.txt file.
The pre-processing steps of textual data are explained in Pre-processing of file. The basic pre-processing steps includes: Tokenization of words and sentences Removal of punctuations Removal of stop-words Stemming of words Lemmatization of words
The binary feature of data: A particular word exsists in a sentence:1, not exists in a sentence:0, is explained in Binary
The computation of count vector, that stores the frequency of words in a sentence, is explained in
The calculation of TF Matrix: Term Frequency matrix and TF-IDF: Term Frequency and Inverse Document Frequency matrix is explained in and