NLP_Statistical_Author_Features_Prediction

Title: Multi-label Text Classification for Blog Post Labeling

Business Objective: To develop a multi-label text classification model that can accurately predict and assign relevant labels to blog posts based on their content, including gender, age, topic, and sign.

Approach: The code preprocesses the text data by removing unwanted characters, converting text to lowercase, removing spaces, and eliminating stopwords. It then merges the label columns into a single column. The dataset is split into training and testing sets. The training data is transformed using CountVectorizer to create a term-document matrix. The OneVsRestClassifier is used to wrap a Logistic Regression classifier for multi-label classification. Evaluation metrics such as accuracy, F1 score, average precision, and average recall are calculated. Finally, the predicted labels are compared with the true labels for selected examples.

Tools Used: Python, pandas, numpy, scikit-learn, nltk

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Description+-+Project+-+Statistical+NLP+-+Author+Features+Prediction.pdf		Description+-+Project+-+Statistical+NLP+-+Author+Features+Prediction.pdf
NLP_Statistical.ipynb		NLP_Statistical.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP_Statistical_Author_Features_Prediction

About

Releases

Packages

Languages

SelvaJenner/NLP_Statistical_Author_Features_Prediction

Folders and files

Latest commit

History

Repository files navigation

NLP_Statistical_Author_Features_Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages