Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 614 Bytes

File metadata and controls

17 lines (12 loc) · 614 Bytes

Project Overview

Objective: Scrape Hindi news headlines and their content from five different categories, build a custom tokenizer,
and fine-tune a model for three-class classification.

Key Steps and Achievements

1 Data Collection:
    Scraped Hindi news headlines and content from five different categories.

2 Tokenizer Development:
    Built a custom tokenizer for the Hindi corpus.
    Published the tokenizer on Hugging Face for public use.

3 Model Fine-Tuning:
    Fine-tuned the dataset for three-class classification on Bert.
    Achieved an accuracy of 0.9832.