This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.
The libraries used in this study are library(datasets)
, library(readr)
,
library(dplyr)
, library(ggplot2)
, library(ggrepel)
, library(kableExtra)
,
library(tidyverse)
, library(scales)
, library(choroplethr)
,
and library(choroplethrMaps)
. In order to run this code please ensure you have these packages installed.
Opioid prescription has become a popular topic because of the increasing prescription and concerns about overdose, opioid-related deaths, and other opioid-associated side effects. Our motivation is to understand the opioid prescription in the States using publicly available data.
-
Among the 50 States and District of Columbia, which five States had the highest and lowest number in
(a) Total opioid claims ,
(b) Opioid prescription rate ( = total opioid claims / total claims ) , and
(c) Opioid prescription per 100 enrollments ? -
Did the total number of opioid prescriptions change over time?
-
Can you visualize the opioid prescription rate on the U.S map?
The Centers for Medicare & Medicaid Services (CMS) has prepared a public dataset, including information in the part D covering calendar years 2013 through 2016, to make our health care system more transparent, and to allow citizens to understand the medical expenditure and relevant health issues. Medicare part D dataset To search opioid-relevant CMS data, you can type "opioid" in the CMS website.. Details about the data acquisition was described in this project.
For learning purpose, links for download the data were provided, and the method to find the links were also provided.
We also use the library(datasets)
for States information. Details can be found
in our other tutorial. [DOI:10.5281/zenodo.2565307.]
We use the R package library(readr) for data import in this tutorial.
Three R packages, namely library(tidyr)
, library(dplyr)
, and
library(kableExtra)
are used for data wrangling in this tutorial.
library(tidyr)
and library(dplyr)
provide us with the main
functions for data wrangling, and library(kableExtra)
provides
us the opportunity to take a look at the data in an organized way.
We also demonstrate some other useful functions for data wrangling, including selecting columns using select(), Selecting rows using filter(), arranging or re-orderings rows using arrange(), joining two datasets using join(), adding columns using mutate(), creating summaries of columns using summarise(), and grouping operations using group_by(). Details can be found in our other tutorial. [DOI:10.5281/zenodo.2565307.]
For exploratory analysis, we use data visualization for exploratory analysis. ggplot2
is the R package we demonstrate in this tutorial, and choroplethrMaps
is the R package we used to help us create the map.
Basic explanation about how to create plots using ggplot() with basic syntax for ggplot2 can be referred to our previous in tutorial. [DOI:10.5281/zenodo.2565307.]
California, Florida, and Texas were the top three States with the highest opioid prescription claim counts in Part D in 2016. However, after taking the population of the States into account, they were not the top three States with the highest opioid prescription rates. Instead, Tennessee, Alabama, and Oklahoma had the highest opioid prescription claims per 100 enrollments. In general, the trend the opioid prescription rate was decreasing during 2013 - 2016 across most of the States. However, the data for this analysis came from Medicare Part D, and thus opioid prescription information for relatively younger adults was missing in the analysis. Combining these data with opioid-related policies across States may be more informative.
The libraries used in this study are kableExtra
, readr
, tidyverse
, datasets
, ggplot2
, scales
, ggrepel
, choroplethr
, and choroplethrMaps
. In order to run this code please ensure you have these packages installed.
The objective of this tutorial is for student to get familiar with important skills in data science, including data import (readr
), data wrangling (tidyverse
) , and data visualization ( ggplot2
, scales
, ggrepel
, choroplethr
, and choroplethrMaps
). Also, the other two useful libraries will be used: kableExtra
(for organizing the tables displayed on the webpage) and datasets
(for getting the geographical information in each States) .
Two potential extensions of this case study are also provided.
This material is designed for 3.0 teaching hours. (One potential way to teach this tutorial is to divide the material into two 1.5 hour sessions. The first session focuses on data import and data wrangling, and the second session focuses on data visualization.)