Skip to content

Commit

Permalink
Merge pull request #98 from fhdsl/manipulating-data
Browse files Browse the repository at this point in the history
Manipulating Data - minor changes
  • Loading branch information
avahoffman authored Jul 15, 2024
2 parents 4a1db34 + 54b71b1 commit 691a7f1
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 88 deletions.
44 changes: 22 additions & 22 deletions modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ title: "Manipulating Data in R"
output:
ioslides_presentation:
css: ../../docs/styles.css
widescreen: yes
widescreen: true
beamer_presentation: default
---

```{r, echo = FALSE, include=FALSE}
Expand Down Expand Up @@ -188,17 +189,19 @@ long_vacc

Newly created column names are enclosed in quotation marks.

## Data used: CO Heat-related ER visits
## Data used: Nitrate exposure

https://daseh.org/data/CO_heat_er_visits_DenverBoulder_wide.csv
Nitrate exposure by quarter for populations on public water systems in the state of Washington for 1999-2020.

https://daseh.org/data/Nitrate_Exposure_for_WA_Public_Water_Systems_byquarter_data.csv

```{r, message = FALSE}
library(dasehr)
wide <- nitrate
head(nitrate)
```

## Mission: Taking the average proportion of the population exposed by concentration
## Mission: Average population exposed by concentration

Let's imagine we want to see what proportion of population exposed to different nitrate concentrations. Results should look something like:

Expand All @@ -210,11 +213,18 @@ example <- tibble(
example
```

## Remove some columns we don't need

```{r}
wide <- wide %>%
select(!ends_with("exceedances"))
wide
```

## Reshaping data from **wide to long**

```{r}
long <- wide %>%
select(!ends_with("exceedances")) %>%
pivot_longer(!c(year, quarter, pop_on_sampled_PWS),
names_to = "conc_cat",
values_to = "conc_count")
Expand All @@ -223,13 +233,13 @@ long

## Reshaping data from **wide to long**

Un-pivoted columns (`year`, `quarter`, `pop_on_sampled_PWS`) are similar
Un-pivoted columns (`year`, `quarter`, `pop_on_sampled_PWS`) are still columns.

```{r}
long
```

## Cleaning up long data
## Cleaning up long data{.codesmall}

Let's make the `conc_count` into a proportion.

Expand All @@ -239,7 +249,7 @@ long <- long %>%
long
```

## Mission: Taking the average proportion of the population exposed by concentration
## Mission: Average population exposed by concentration

Now our data is more tidy, and we can take the averages easily!

Expand Down Expand Up @@ -280,13 +290,15 @@ wide_vacc <- long_vacc %>% pivot_wider(names_from = "Month",
wide_vacc
```

## Reshaping CO Heat-related ER Visits data
## Reshaping nitrate exposure data{.codesmall}

What if we wanted different columns for each quarter?

```{r}
long
```

## Reshaping CO Heat-related ER Visits data
## Reshaping nitrate exposure data

```{r}
wide <- long %>%
Expand All @@ -295,18 +307,6 @@ wide <- long %>%
wide
```

## Reshaping Summary `tibbles`

Reshaping can be helpful for your assessment of two `group_by()` categories.

```{r}
long %>%
group_by(conc_cat, quarter) %>%
summarize("avg_prop" = mean(conc_prop)) %>%
pivot_wider(names_from = "quarter", values_from = "avg_prop")
```


## Summary

- `tidyr` package helps us convert between wide and long data
Expand Down
91 changes: 47 additions & 44 deletions modules/Manipulating_Data_in_R/Manipulating_Data_in_R.html

Large diffs are not rendered by default.

Binary file modified modules/Manipulating_Data_in_R/Manipulating_Data_in_R.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ editor_options:
knitr::opts_chunk$set(echo = TRUE)
```

Data in this lab comes from the OCS "Exploring CO2 emissions across time" activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/) and the CO Department of Health (https://coepht.colorado.gov/heat-related-illness). Both datasets are available in the `dasehr` package.
Some data in this lab comes from the OCS "Exploring CO2 emissions across time" activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/. This dataset is available in the `dasehr` package.

Additional data about climate change disasters can be found at "https://daseh.org/data/Yearly_CC_Disasters.csv".

```{r message=FALSE}
library(readr)
library(dplyr)
library(tidyr)
library(tidyverse)
library(dasehr)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,13 +353,11 @@ <h1 class="title toc-ignore">Manipulating Data in R Lab</h1>
</div>


<p>Data in this lab comes from the OCS “Exploring CO2 emissions across
time” activity (<a href="https://www.opencasestudies.org/ocs-bp-co2-emissions/" class="uri">https://www.opencasestudies.org/ocs-bp-co2-emissions/</a>)
and the CO Department of Health (<a href="https://coepht.colorado.gov/heat-related-illness" class="uri">https://coepht.colorado.gov/heat-related-illness</a>). Both
datasets are available in the <code>dasehr</code> package.</p>
<pre class="r"><code>library(readr)
library(dplyr)
library(tidyr)
<p>Some data in this lab comes from the OCS “Exploring CO2 emissions
across time” activity (<a href="https://www.opencasestudies.org/ocs-bp-co2-emissions/" class="uri">https://www.opencasestudies.org/ocs-bp-co2-emissions/</a>.
This dataset is available in the <code>dasehr</code> package.</p>
<p>Additional data about climate change disasters can be found at “<a href="https://daseh.org/data/Yearly_CC_Disasters.csv" class="uri">https://daseh.org/data/Yearly_CC_Disasters.csv</a>”.</p>
<pre class="r"><code>library(tidyverse)
library(dasehr)</code></pre>
<div id="part-1" class="section level1">
<h1>Part 1</h1>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ editor_options:
knitr::opts_chunk$set(echo = TRUE)
```

Data in this lab comes from the OCS "Exploring CO2 emissions across time" activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/) and the CO Department of Health (https://coepht.colorado.gov/heat-related-illness). Both datasets are available in the `dasehr` package.
Some data in this lab comes from the OCS "Exploring CO2 emissions across time" activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/. This dataset is available in the `dasehr` package.

Additional data about climate change disasters can be found at "https://daseh.org/data/Yearly_CC_Disasters.csv".

```{r message=FALSE}
library(readr)
library(dplyr)
library(tidyr)
library(tidyverse)
library(dasehr)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,13 +353,11 @@ <h1 class="title toc-ignore">Manipulating Data in R Lab - Key</h1>
</div>


<p>Data in this lab comes from the OCS “Exploring CO2 emissions across
time” activity (<a href="https://www.opencasestudies.org/ocs-bp-co2-emissions/" class="uri">https://www.opencasestudies.org/ocs-bp-co2-emissions/</a>)
and the CO Department of Health (<a href="https://coepht.colorado.gov/heat-related-illness" class="uri">https://coepht.colorado.gov/heat-related-illness</a>). Both
datasets are available in the <code>dasehr</code> package.</p>
<pre class="r"><code>library(readr)
library(dplyr)
library(tidyr)
<p>Some data in this lab comes from the OCS “Exploring CO2 emissions
across time” activity (<a href="https://www.opencasestudies.org/ocs-bp-co2-emissions/" class="uri">https://www.opencasestudies.org/ocs-bp-co2-emissions/</a>.
This dataset is available in the <code>dasehr</code> package.</p>
<p>Additional data about climate change disasters can be found at “<a href="https://daseh.org/data/Yearly_CC_Disasters.csv" class="uri">https://daseh.org/data/Yearly_CC_Disasters.csv</a>”.</p>
<pre class="r"><code>library(tidyverse)
library(dasehr)</code></pre>
<div id="part-1" class="section level1">
<h1>Part 1</h1>
Expand Down

0 comments on commit 691a7f1

Please sign in to comment.