Link to Google Slides presentation
The purpose of this project is to analyze and predict the level of Vaccine Hesitancy in USA in late 2021, a year after the outbreak of the COVID pandemic. The datasets which we will be using for statistical calculations and other technology libraries will help identify and display trends, and predict the outcomes accordingly.
-
What counties will rank hesitant versus strongly hesitant to vaccination based on local
public mask mandates,
'stay-at-home orders', 'gathering bans' and the prevalence of certainhealth conditions
? -
What counties will rank hesitant versus strongly hesitant based on
CVAC levels
measuring a county's ability to handle a COVID outbreak? -
What counties will rank hesitant versus strongly hesitant based on its
SVI levels
measuring a county's social vulnerability to disaster? -
What counties will rank hesitant versus strongly hesitant based on the local COVID transimission level?
https://data.cdc.gov/Vaccinations/Vaccines-gov-COVID-19-vaccinating-provider-locatio/5jp2-pgaw
https://data.cdc.gov/Vaccinations/Fully-Vaccinated-Adults/jm79-dz78
https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh
Software & Libraries | |
---|---|
Data Retrieval, Cleaning, and Analysis: | Python, API |
Database Management/Storage: | PostgreSQL |
Predictive Analysis: | ML |
Data Visualization: | Tableau |
-
Data Cleaning and Analysis: Python technology with Pandas library will be used to clean the data and perform an exploratory analysis. Further Trending analysis will be completed using NumPy.
-
Database Storage: PostgreSQL is the database we intend to use, and we will integrate pgAdmin to display the data.
-
Machine Learning: SciKitLearn is the ML library we'll be using to create a regressor. We will split the data into training and testing datasets using a 75-25 split setup. Afterwards, we will use several regression models to validate our machine learning as listed below.
Regression Tree
: Regression Tree model is similar to decision tree classifier, it runs like a tree with branches to sub-divide all records in training sets and recursively partition until a simple model can fit them. It is suitable for non-linear regression, when interactions are complicated.Linear Regression
: Linear Regression is the supervised Machine Learning model in which the model finds the best fit linear line between the independent and dependent variable.Support Vector Regression (SVR)
: This model improves the flexibility from the linear regression to give an acceptable areas of errors and find a hyperplane to fit the data.
-
Dashboard: Tableau will be used for the final report and integrated with TABLEAU for a fully functioning and interactive dashboard. Finally, the finished data visualizations will be hosted on Google Slides.
Our team members will present a provisional machine learning model that stands in for the final machine learning model and accomplishes the following:
- Takes in raw data from the provisional database
- Outputs label(s) for input data
- Manage repositories rendered by team members
Our team members handling Data Management will present a provisional database that stands in for the final database and accomplishes the following:
- Images of code
- Sample data that mimics the expected final database structure/schema
- Provisional draft of machine learning module with code showing connection to the provisional database