This project focuses on leveraging Machine Learning, especially Natural Language Processing (NLP), to classify text. Students will implement and train a one-dimensional (1D) convolutional neural network (CNN) that takes a text input and outputs a class label for the text.
Identify hate speech by classifying comments as toxic
or not toxic
.
Disclaimer: This dataset contains text that may be considered profane, vulgar, or offensive.
Classify the sentiment (negative
or positive
) of movie reviews.
Students are encouraged to try out other datasets of interest and to extend the task to multi-class text classification.
As a bonus task, students can write a script that uses the trained model to classify user input in real-time.
- Laptop/PC
- Access to stable Internet
- Google Collaboratory
- Install all the necessary libraries. Type following command in your Jupyter Notebook:
!pip install -r requirements.txt
- Install Glove
!wget http://nlp.stanford.edu/data/glove.6B.zip
- Unzip Glove
!unzip -q glove.6B.zip
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
@article{buerle2019net2vis,
title={Net2Vis -- A Visual Grammar for Automatically Generating Publication-Ready CNN Architecture Visualizations},
author={Alex Bäuerle and Christian van Onzenoodt and Timo Ropinski},
year={2019},
eprint={1902.04394},
archivePrefix={arXiv},
primaryClass={cs.LG}
}