Text-Based-Feature-Extraction-using-Python

This repository contains a brief introduction about feature extraction of text based data.

The textual data is present in resort.txt file.

The pre-processing steps of textual data are explained in Pre-processing of Data.py file. The basic pre-processing steps includes: Tokenization of words and sentences Removal of punctuations Removal of stop-words Stemming of words Lemmatization of words

The binary feature of data: A particular word exsists in a sentence:1, not exists in a sentence:0, is explained in Binary Features.py

The computation of count vector, that stores the frequency of words in a sentence, is explained in CountVector.py

The calculation of TF Matrix: Term Frequency matrix and TF-IDF: Term Frequency and Inverse Document Frequency matrix is explained in TF_matrix.py and TF-IDF_Matrix.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Based-Feature-Extraction-using-Python

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit SrishtiVashishtha Add files via upload Jul 16, 2020 bad85a7 · Jul 16, 2020 History 5 Commits
Binary Features.py		Binary Features.py
CountVector.py		CountVector.py
Pre-processing of Data.py		Pre-processing of Data.py
README.md		README.md
TF-IDF_Matrix.py		TF-IDF_Matrix.py
TF_matrix.py		TF_matrix.py
resort.txt		resort.txt

SrishtiVashishtha/Text-Based-Feature-Extraction-using-Python

Folders and files

Latest commit

History

Repository files navigation

Text-Based-Feature-Extraction-using-Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages