NeuralNews

A machine learning project that categorizes news articles from headlines using a fine-tuned DistilBERT model, trained on nearly 210,000 news articles from 2012 to 2022 from Huffpost.

Overview

DistilBERT is a distilled version of BERT that retains 97% of BERT's language understanding whilst being 60% smaller and significantly faster. The pre-trained model has been trained on the News Category Dataset from Kaggle which consists of around 210k headlines.

Screenshots

Dataset

The dataset used has been taken from Kaggle.

category: Category in which the article was published.
headline: Headline of the article.
authors: Authors of the article.
link: Link to the original article.
short_description: Short description of the article.
date: Date when the article was published.

There are a total of 42 categories.

Dataset Source

Metrics

The model achieved an accuracy of nearly 65%, on both the validation and test data.

Citations

Misra, Rishabh. "News Category Dataset." arXiv preprint arXiv:2209.11429 (2022).
Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 9798585463570 (2021).

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Note

The License does not cover the dataset. It has been taken from Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.streamlit		.streamlit
classifier		classifier
demo		demo
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NewsClassifier.ipynb		NewsClassifier.ipynb
README.md		README.md
app.py		app.py
encoder.pkl		encoder.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralNews

Overview

Screenshots

Dataset

Metrics

Citations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuralNews

Overview

Screenshots

Dataset

Metrics

Citations

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages