Peoples interactions with negative comments have increased over the years, especially in a world where everyone is on social media. In 2014, a Pew Research Center study found that about 66 percent of internet users who have experienced online harassment said their most recent incident occurred on a social networking site or app. A lot of these comments could be automatically filtered out and altered to be positive in order to ensure that people have a more positive experience online. That is the inspiration for this research, and this is why we have produced natural language processing techniques in order to mark negative comments and then transform them into positive comments.
Naive Bayes over fitted the data, so Random Forest classification was used instead because it was better suited for the goal. Random Forest is better suited because we can accurately weigh the positive and negative sentences when training. For the testing, the results of our Naive vs Smart vectorization had expected results for the easier sentences. The Naive was only able to get 40 percent accuracy. While the Smart algorithm was able to achieve 90 percent accuracy. These results are expected because smart vectorization was designed to keep proper sentence syntax. However, when the difficulty of the sentences rose the Naive algorithm was able to keep up with the Smart algorithm and only miss-classifying one more sentence. A possible reason for this could be that the longer and more complicated a sentence gets, the more words you have to take into consideration when constructing a proper syntactical sentence. When the sentences as small as social media comments usually are, the algorithm is a viable solution to censoring swear words.