Disclamer: Machine learning model and hashtag categorization were trained on datasets containing hate speech and slurs. Files containing hate speech and other negative keywords have been ommitted from the repository.
This project is based on Co:Here's challenge to create a trained sentiment and language processing application. It is also inspired by the Berggruen Institutes Challenge to create a product that facilitates deliberation in a community by presenting current political trends in tweets. We used the twitter API, tweepy to compare current tweets to datasets we gathered from the internet. These datasets will be referenced at the end of this document.
Cohere helps developers generate or analyze text to do things like write and summarize copy, moderate content, classify data and more, all at a massive scale. No matter your level of experience, Cohere's API makes it easy to build machine learning and state-of-the-art language AI into your application.
Build something awesome that showcases the best use of Cohere's API for a chance to win some incredible prizes!
Build a program that facilitates deliberation and decision-making among residents in a community. First you must identify the problem you want to solve and substantiate your choice by demonstrating it is of concern to the community you have selected. Then you need to build and app that will recruit a randomly selected, representative sample of the population (you can do this via web scraping). Finally, you will populate the program with important information necessary for the recruited representative to make an informed decision concerning the problem and will facilitate a voting of decision-making process in an unbiased way. Our political institutions are suffering from declining trust a legitimacy and it is time we think about ways to reinvent democracy. Among the experiments being developed are new institutions for citizen engagement but few of them are scalable. We need to leverage technology to harness the collective intelligence of citizens in solving shared problem in their communities.
-
Determines political sentiment of tweets using Co:Here sentiment analysis technology.
-
Finds trends in hashtags used by political figures.
-
Categorizes tweets as extremist and places them on the political spectrum.
-
Uses machine learning model to recognize hate speech.
---
As previously stated, files used to train the sentiment analysis AI have been ommited. If you wish to access these datasets use the links below. Please be advised that these datasets are a compilation of hateful beliefs shared on twitter.
-
Samoshyn, A. (2020). Hate Speech and Offensive Language Dataset (Version 1) [Data set]. https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
-
Kash. (2022). Global Political tweets (Version 22) [Data set]. https://www.kaggle.com/datasets/kaushiksuresh147/political-tweets
-
Kazanova, M. (2018). Sentiment140 dataset with 1.6 million tweets (Version 2) [Data set]. https://www.kaggle.com/datasets/kazanova/sentiment140