positive or negative (e.g. sentiment analysis), by function, intention or purpose, or by industry or other categories for analytics and trending
One of the widely used natural language processing task in different business problems is “Text Classification”. The goal of text classification is to automatically classify the text documents into one or more defined categories. Some examples of text classification are:
Understanding audience sentiment from social media,
Detection of spam and non-spam emails,
Auto tagging of customer queries, and
Categorization of news articles into defined topics.
Applications of text classification range from spam filtering, sentiment analysis, content tagging/classification.
Text Classification is an example of supervised machine learning task since a labelled dataset containing text documents and their labels is used for train a classifier. An end-to-end text classification pipeline is composed of three main components:
Dataset Preparation: The first step is the Dataset Preparation step which includes the process of loading a dataset and performing basic pre-processing. The dataset is then splitted into train and validation sets.
Feature Engineering: The next step is the Feature Engineering in which the raw dataset is transformed into flat features which can be used in a machine learning model. This step also includes the process of creating new features from the existing data.
Model Training: The final step is the Model Building step in which a machine learning model is trained on a labelled dataset.
Improve Performance of Text Classifier: In this article, we will also look at the different ways to improve the performance of text classifiers.
Requirements:
spaCy
Scikit-learn
USAGE:
from sentlyzer import sentilyze
load = sentilyz()
text = "Sample Text"
cls = predict(text)
print(cls)