- Perform sentiment analysis by applying Maximum Entropy Classification to movies review data.
- Observe the affect on accuracy by the discriminating features of stop words, punctuations, lemmatization and also the amount of training data fed.
- Perform analysis on the unbalanced collection – changing proportions of positive and negative samples in training data.
The following case studies were proposed:
Case Study I:
Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for all the words assuming equal proportions of positive and negative examples
Case Study II:
Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 500 words assuming equal proportions of positive and negative examples
Case Study III:
Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 1000 words assuming equal proportions of positive and negative examples
Case Study IV:
Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming unequal proportions of negative and positive examples
Case Study V:
Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming only negative examples