Skip to content

Latest commit

 

History

History
28 lines (22 loc) · 757 Bytes

README.md

File metadata and controls

28 lines (22 loc) · 757 Bytes

Natural Language Processing

This module contains methods to accomplish several tasks related to the field of natural language processing (NLP) such as:

P R E P R O C E S S I N G:

Although there are many high-level API available, performing the text cleaning manually can give some advantages in regards of customization.

  • Loading the data and selecting relevant parts
  • Removing punctuation
  • Removing words shorter than a choosen length
  • Replace numbers with their word-based equivalent
  • Removing stopwords
  • Lemmatization
  • Tokenization

N L P:

  • Sentiment Analysis
  • Part-of-Speech-Tagging (POS-Tagging)
  • Named-Entity-Recognition (NER)
  • TF-IDF Scoring
  • Cosine Similarity
  • MinHashing
  • WordEmbedding
  • Latent Semantic Analysis (LSA)