This is a collaborative data analysis project investigating the COVID-19 dataset provided and maintained by the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, implemented in R and submitted as a group project for the STAT 3355: Data Analysis for Statisticians and Actuaries course at The University of Texas at Dallas. Our final results consist of a presentation and a report, both available in this repository.
Our driving questions focus on identifying public health and socioeconomic factors associated with the spread and lethality of COVID-19, as well as attempting to predict the future course of COVID-19 with respect to caseloads and vaccine administrations in the state of Texas.
Our primary dataset is the CSSE dataset, which provides data on cases. Secondary sources of public health data (e.g. vaccines, pre-existing conditions, socioeconomic status, population density) are listed below.
exploratory/
- Data exploration and pattern discovery codepresentation/
- Presentation code and slides as delivered in classproposal/
- Initial proposal containing project ambitions as submittedreport/
- Final report and code as submitted
- JHU CCI CRC COVID-19 Data
- Texas DHHS COVID-19 Case Fatality and Demographics
- CDC Research Study (“Underlying Medical Conditions and Severe Illness Among 540,667 Adults Hospitalized With COVID-19, March 2020–March 2021”)
- Land Area from The County Information Program of Texas Association of Counties
ggplot2
lubridate
dplyr
plyr
gridExtra
cowplot
grid
gapminder
readxl
UsingR
Michael Tsang, Kevin Jin, & Mingyu Sun