The reason behind creating this project is to analyse Pharma tweets using NLP methods in Python 3. I came up with that idea after stumbling upon Harvard Business Review Twitter empathy scorings from 2015 and their article "50 Companies That Get Twitter – and 50 That Don’t" by Belinda Parmar. I noticed that my company was the last one out of 300 and decided to check why is that myself
- Found a business problem to work on - "50 Companies That Get Twitter – and 50 That Don’t". Has any of the comapnies discovered how to communicate better on Twitter? What are the industry "standards" on this social media channel?
- Scraped data from 15 highest-earning pharmaceutical companies' Twitter accounts
- Cleaned data
- Explored data and created a document in Jupyter Notebook to show the code, outputs and my comments
- Visualised the outcomes
- Presented it in front of the stakeholders at my company
data_analysis - contains the Jupyter Notebook document and code that created a wordcloud shaped like AZ's logo
web_scraping - contains code used to scrape tweets of selected companies
import pandas as pd
import numpy as np
import tweepy
import csv
import time
import re
import os
import wordcloud
import matplotlib.pyplot as plt
import nltk
import PIL
import scipy
import collections
import seaborn as sns
- https://en.wikipedia.org/wiki/List_of_largest_pharmaceutical_companies_by_revenue
- 15 most profitable companies Twitter accounts
- https://developer.twitter.com/en/docs