Indian corpus is a collection of these Indian Languages: Bengali, Hindi, Marathi, and Telugu language data. NLTK is Natural Language Toolkit Library.
- Here I have imported NLTK(Natural Language Tool Kit).
- Imported indian corpus from NLTK.
- Stored that Indian Corpus into 'bangla.pos'.
- 'bangla.pos' has been stored in a variable 'tagged_set'.
- Stored the bengali sentences from bengali corpus into 'word_set' variable.
- Using for loop to count the number of sentences, present in that corpus.
- Google Colab/Jupyter
- Language: Python
- NLTK Library
Prof. Sandipan Ganguly
Rajdeep Das
Click here to read the source article.