GitHub - UNCG-CSE/SP-21_COVID-Team_4

Covid Data Analysis - Group Project (CSC-405-605_Spring_2021)

An analytical system to understand and visualize the patterns of COVID-19 effect and spread across different counties of the United States

Project Description

This project aims to examine COVID-19 data in the United States in order to view possible trends in the virus’s spread. This involves analyzing the number of cases, the number of deaths, and the various counties’ population. This data is combined with several enrichment dataset like hospital beds, presidential election results, employment, economic characteristics, and demographic information.

Covid-19 dataset combined with the enrichment dataset can help understand the pattern in the change in number of cases and deaths and their correlation with different factors. Using these data, a linear and non-linear regression models is developed for predicting the number of cases and deaths due to COVID-10 in the United States. The data is processed using statistical models and presented using graphs with a trendline, confidence intervals, and a prediction path. Ultimately, a simple interactive dashboard is created based on the analysis where user can visualize the present trend, predictions, moving average and more.

Stage I - Data and Project Understanding

This is the first stage of the project where we get acquainted with the COVID-19 dataset. These datasets are provided by USAFacts. We have utilized the daily county-level tracker of COVID-19 cases in the US. You can use the links below to download the granular level data from USAFacts.

COVID-19 Dataset

Stage II - Data Modeling and Hypothesis Testing

This is the second stage of the project where we dig deep into data modeling and hypothesis testing. With the preliminary intuitions we had in stage 1, we develop a formal hypothesis and use statistical modeling to prove or disprove the hypothesis. We compare the weekly statistics by using mean, median, mode for our 3 main variables and plot the daily trends in a meaningful way. We also search for correlation between different features. Additionally, we compare the data of the United States against other countries with the World dataset.

COVID-19 Worldwide Dataset.

Worldwide Dataset

Stage III - Basic Machine Learning

In this stage, we dive into developing linear and non-linear regression models for predicting the cases and deaths in the United States. Machine learning and statistical models are used to predict the trend of COVID-19 cases/deaths. We also plot trend line and forecast our prediction of 1 week ahead. Confidence intervals is introduced to analyze the error in prediction. Hypothesis testing on the hypothesis formulated in stage 2 of the project is performed.

Stage IV - Dashboard

With the use of frameworks like Plotly along with Dash, we develop an interactive dashboard for our fourth and final stage of the project. We allow for the selection of date(s), states and linear/log normalization as different methods to filter the data and present the result/ analysis in a group. This group is interactive and also easily interpretable.

This document details further about the dashboard, provides different snapshots along with a short description of what each snapshot represents.

Technologies:

Python: 3.7
Jupyter Notebook: 6.2.0

Project Setup:

To run this project locally, make sure you have python, pip and jupyter notebook installed. You will also need some additional python libraries to run the project properly. You can install all libraries using the code below. On your project parent directory, run:

pip install -r requirements.txt

To open jupyter notebook, while you are in the project's parent directory, run:

jupyter notebook

Project Status

Stage I: Complete

Stage II: Complete

Stage III: Complete

Stage IV: Complete

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
data		data
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid Data Analysis - Group Project (CSC-405-605_Spring_2021)

An analytical system to understand and visualize the patterns of COVID-19 effect and spread across different counties of the United States

Table of Contents

Project Description

Data and Project Understanding

COVID-19 Dataset

Data Modeling and Hypothesis Testing

COVID-19 Worldwide Dataset

Stage III - Basic Machine Learning

Stage IV - Dashboard

Technologies

Project Setup

Project Status

Sources

Project Description

Stage I - Data and Project Understanding

COVID-19 Dataset

Stage II - Data Modeling and Hypothesis Testing

COVID-19 Worldwide Dataset.

Stage III - Basic Machine Learning

Stage IV - Dashboard

Technologies:

Project Setup:

Project Status

Sources

About

Releases

Packages

Contributors 6

Languages

UNCG-CSE/SP-21_COVID-Team_4

Folders and files

Latest commit

History

Repository files navigation

Covid Data Analysis - Group Project (CSC-405-605_Spring_2021)

An analytical system to understand and visualize the patterns of COVID-19 effect and spread across different counties of the United States

Table of Contents

Project Description

Stage I - Data and Project Understanding

COVID-19 Dataset

Stage II - Data Modeling and Hypothesis Testing

COVID-19 Worldwide Dataset.

Stage III - Basic Machine Learning

Stage IV - Dashboard

Technologies:

Project Setup:

Project Status

Sources

About

Resources

Stars

Watchers

Forks

Languages