This is a text classification project. The dataset used in this project is 20 Newsgroups dataset. This data set consists of 20000 newsgroups posts or messages taken from 20 newsgroups. The dataset is divided into two parts : one for training (or development) and the other one for testing (or for performance evaluation).
The classifiers are trained on the training dataset and they are used to classify the posts in the testing dataset. The classifiers predictions are then compared with the testing dataset outputs and the corresponding scores or accuracies are evaluated.