Project Overview
Objective: Scrape Hindi news headlines and their content from five different categories, build a custom tokenizer,
and fine-tune a model for three-class classification.
Key Steps and Achievements
1 Data Collection:
Scraped Hindi news headlines and content from five different categories.
2 Tokenizer Development:
Built a custom tokenizer for the Hindi corpus.
Published the tokenizer on Hugging Face for public use.
3 Model Fine-Tuning:
Fine-tuned the dataset for three-class classification on Bert.
Achieved an accuracy of 0.9832.