Skip to content

Commit

Permalink
Merge pull request #151 from SISBID/ava/day3
Browse files Browse the repository at this point in the history
Day 3 updates
  • Loading branch information
avahoffman authored Jul 26, 2023
2 parents 147b697 + 81f53b1 commit d5dc8bd
Show file tree
Hide file tree
Showing 13 changed files with 887 additions and 707 deletions.
17 changes: 8 additions & 9 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,28 +66,27 @@ Make sure you have a GitHub profile: https://github.com/
| 11:00-11:30 | Break | | |
| 11:30-12:20 | [Reproducible Research][13] ([PDF][14]) | [lab][15] | [lab key][16] / [lab key output][17] |
| 12:20-12:35 | Break | | |
| 12:35-1:25 | [Data Subsetting Part 1][22] ([PDF][23]) | [lab][24] | [lab key][25] / [lab key output][26] |
| 12:35-1:25 | [Data Subsetting Part 1][22] ([PDF][23]) | [lab][24] | [lab key][25] / [lab key output][26] |
| 1:25-1:40 | Break | | |
| 1:40-2:30 | [Missing Data][x5] ([PDF][x6]) | [lab][x7] | [lab key][x8]/[lab key output][x9] |
| 1:40-2:30 | [Missing Data][x5] ([PDF][x6]) | [lab][x7] | [lab key][x8]/[lab key output][x9] |
| **Day 2** | | | |
| 8:00-8:50 | [Data Subsetting Part 2][27] ([PDF][28]) | [lab][29] | [lab key][30] / [lab key output][31] |
| 8:50-9:05 | Break | | |
| 9:05-9:55 | [Data Summarization][32] ([PDF][33]) | [lab][34] | [lab key][35] / [lab key output][36] |
| 9:05-9:55 | [Data Summarization][32] ([PDF][33]) | [lab][34] | [lab key][35] / [lab key output][36] |
| 9:55-10:10 | Break | | |
| 10:10-11:00 | [Data Cleaning Part 1][37] ([PDF][38]) | [lab][39] | [lab key][40] / [lab key output][41]|
| 10:10-11:00 | [Data Cleaning Part 1][37] ([PDF][38]) | [lab][39] | [lab key][40] / [lab key output][41]|
| 11:00-11:30 | Break | | |
| 11:30-12:20 | [Data Cleaning Part 2][x10] ([PDF][x11]) | [lab][x12] | [lab key][x13] / [lab key output][x14] |
| 11:30-12:20 | [Data Cleaning Part 2][x10] ([PDF][x11]) | [lab][x12] | [lab key][x13] / [lab key output][x14] |
| 12:20-12:35 | Break | | |
| 12:35-1:25 | [Data Reshaping][42] ([PDF][43]) | [lab][44] | [lab key][45] / [lab key output][46] |
| 1:25-1:40 | Break | | |
| 1:40-2:30 | [Data Merging and Joining][47] ([PDF][48]) | [lab][49] | [lab key][50] / [lab key output][51] |
| **Day 3** | | | |
| 8:00-8:50 | [Advanced Data Input/Output][52] ([PDF][53]) | [lab][54] | [lab key][55] / [lab key output][56] |
| 8:00-8:50 | [Functionally Programming][57] ([PDF][58]) | [lab][59] | [lab key][60] / [lab key output][61] |
| 8:50-9:05 | Break | | |
| 9:05-9:55 | [Functionally Programming][57] ([PDF][58]) | [lab][59] | [lab key][60] / [lab key output][61] |
| 9:05-9:55 | [Advanced Data Input/Output][52] ([PDF][53]) | [lab][54] | [lab key][55] / [lab key output][56] |
| 9:55-10:10 | Break | | |
| 10:10-11:00 | [GitHub Concepts][18] ([PDF][19]) | [lab][21] | |
| | [Updating files with GitHub][x3] ([PDF][x4]) | | |
| 10:10-11:00 | [GitHub Concepts][18] ([PDF][19]) | [lab][21] | |
| | [Wrap Up][64] ([PDF][65]) | | |


Expand Down
27 changes: 10 additions & 17 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -602,10 +602,10 @@ <h2>Data Wrangling with R</h2>
</tr>
<tr class="even">
<td>8:00-8:50</td>
<td><a href="lecture_notes/Advanced_Data_IO.html">Advanced Data
Input/Output</a> (<a href="lecture_notes/Advanced_Data_IO.pdf">PDF</a>)</td>
<td><a href="labs/advanced-io-lab.Rmd">lab</a></td>
<td><a href="labs/advanced-io-lab-key.Rmd">lab key</a> / <a href="labs/advanced-io-lab-key.html">lab key output</a></td>
<td><a href="lecture_notes/Functional_Programming.html">Functionally
Programming</a> (<a href="lecture_notes/Functional_Programming.pdf">PDF</a>)</td>
<td><a href="labs/functional-program-lab.Rmd">lab</a></td>
<td><a href="labs/functional-program-lab-key.Rmd">lab key</a> / <a href="labs/functional-program-lab-key.html">lab key output</a></td>
</tr>
<tr class="odd">
<td>8:50-9:05</td>
Expand All @@ -615,10 +615,10 @@ <h2>Data Wrangling with R</h2>
</tr>
<tr class="even">
<td>9:05-9:55</td>
<td><a href="lecture_notes/Functional_Programming.html">Functionally
Programming</a> (<a href="lecture_notes/Functional_Programming.pdf">PDF</a>)</td>
<td><a href="labs/functional-program-lab.Rmd">lab</a></td>
<td><a href="labs/functional-program-lab-key.Rmd">lab key</a> / <a href="labs/functional-program-lab-key.html">lab key output</a></td>
<td><a href="lecture_notes/Advanced_Data_IO.html">Advanced Data
Input/Output</a> (<a href="lecture_notes/Advanced_Data_IO.pdf">PDF</a>)</td>
<td><a href="labs/advanced-io-lab.Rmd">lab</a></td>
<td><a href="labs/advanced-io-lab-key.Rmd">lab key</a> / <a href="labs/advanced-io-lab-key.html">lab key output</a></td>
</tr>
<tr class="odd">
<td>9:55-10:10</td>
Expand All @@ -635,13 +635,6 @@ <h2>Data Wrangling with R</h2>
</tr>
<tr class="odd">
<td></td>
<td><a href="lecture_notes/Updating_files_with_github.html">Updating
files with GitHub</a> (<a href="lecture_notes/Updating_files_with_github.pdf">PDF</a>)</td>
<td></td>
<td></td>
</tr>
<tr class="even">
<td></td>
<td><a href="https://docs.google.com/presentation/d/18UA0WLDVasiSijFyRJZ3T3GWl8eokv4GBxMzQKy8sjk/edit?usp=sharing">Wrap
Up</a> (<a href="lecture_notes/sisbid_wrap_up_2022.pdf">PDF</a>)</td>
<td></td>
Expand All @@ -652,8 +645,8 @@ <h2>Data Wrangling with R</h2>
<p><strong>Miscellaneous</strong></p>
<p>Feel free to submit typos/errors/etc via the github repository
associated with the class: <a href="https://github.com/SISBID/Data-Wrangling" class="uri">https://github.com/SISBID/Data-Wrangling</a></p>

<p>This page was last updated on 2023-07-25 03:55:39 Eastern Time.</p>
<p>This page was last updated on 2023-07-25 22:21:06.099602 Eastern
Time.</p>
</div>


Expand Down
47 changes: 34 additions & 13 deletions labs/advanced-io-lab-key.R
Original file line number Diff line number Diff line change
@@ -1,44 +1,65 @@
## ---- include=FALSE------------------------------------------------------------------------------------------------------------------
## ---- include=FALSE-------------------------------------------------------------------------------------------
library(tidyverse)
library(httr)
library(jsonlite)
library(googlesheets4)


## ----eval=FALSE----------------------------------------------------------------------------------------------------------------------
gs4_auth()
## ----eval=FALSE-----------------------------------------------------------------------------------------------
## gs4_auth()


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
sheet_url <- "https://docs.google.com/spreadsheets/d/1KIRtcPVn58R3_qr97WNtcOJiY4AaytHzGDzLW_3_R1s/edit?usp=sharing"


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
x <- read_sheet(sheet_url)


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
sheet_names(sheet_url)


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
read_sheet(sheet_url, range = cell_cols("A:B"))


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
x %>%
filter(str_detect(`Why are you taking this module (free text)`, pattern = "learn"))


## -------------------------------------------------------------------------------------------------------------
jsonData <- fromJSON("https://think.cs.vt.edu/corgis/datasets/json/airlines/airlines.json")


## ---- error = TRUE-------------------------------------------------------------------------------------------------------------------
## ---- error = TRUE--------------------------------------------------------------------------------------------
str(jsonData)
# Airport, Time and Statistics
dim(jsonData$Airport)
dim(jsonData$Time)
dim(jsonData$Statistics)

colnames(jsonData$Statistics)


## -------------------------------------------------------------------------------------------------------------
air_2016 <- jsonData %>%
filter(Time$Year == 2016)


## -------------------------------------------------------------------------------------------------------------
air_2016$Airport %>% count()
# OR
length(unique(air_2016$Airport$Code))


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
lga_ord <- jsonData %>%
filter(Airport$Code %in% c("LGA", "ORD") & Time$Year == 2016)
filter(Airport$Code %in% c("LGA", "ORD"))


## ------------------------------------------------------------------------------------------------------------------------------------
## -------------------------------------------------------------------------------------------------------------
airport_list <- list(
airport_code = lga_ord$Airport$Code,
total_flights = lga_ord$Statistics$Flights$Total,
Expand Down
33 changes: 30 additions & 3 deletions labs/advanced-io-lab-key.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ editor_options:

```{r, include=FALSE}
library(tidyverse)
library(httr)
library(jsonlite)
library(googlesheets4)
```
Expand Down Expand Up @@ -46,6 +45,13 @@ sheet_names(sheet_url)
read_sheet(sheet_url, range = cell_cols("A:B"))
```

5. How often do you see the word "learn" in the dataset `x`? Hint: use a `stringr` function.

```{r}
x %>%
filter(str_detect(`Why are you taking this module (free text)`, pattern = "learn"))
```

## JSON Lab

<!-- See here for more fun practice! https://github.com/jdorfman/Awesome-JSON-Datasets -->
Expand All @@ -62,13 +68,34 @@ jsonData <- fromJSON("https://think.cs.vt.edu/corgis/datasets/json/airlines/airl

```{r, error = TRUE}
str(jsonData)
# Airport, Time and Statistics
dim(jsonData$Airport)
dim(jsonData$Time)
dim(jsonData$Statistics)
colnames(jsonData$Statistics)
```

3. Filter `jsonData` to include only Year : 2016. Call this `air_2016`.

```{r}
air_2016 <- jsonData %>%
filter(Time$Year == 2016)
```

4. How many unique Airports are there?

```{r}
air_2016$Airport %>% count()
# OR
length(unique(air_2016$Airport$Code))
```

3. Filter `jsonData` to include only NYC LaGuardia and Chicago O'Hare airports (Code : "LGA", "ORD") and Year : 2016. Call this `lga_ord`.
5. Filter `air_2016` to include only NYC LaGuardia and Chicago O'Hare airports (Code : "LGA", "ORD"). Call this `lga_ord`.

```{r}
lga_ord <- jsonData %>%
filter(Airport$Code %in% c("LGA", "ORD") & Time$Year == 2016)
filter(Airport$Code %in% c("LGA", "ORD"))
```

**Bonus Practice**
Expand Down
286 changes: 165 additions & 121 deletions labs/advanced-io-lab-key.html

Large diffs are not rendered by default.

27 changes: 22 additions & 5 deletions labs/advanced-io-lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ editor_options:

```{r, include=FALSE}
library(tidyverse)
library(httr)
library(jsonlite)
library(googlesheets4)
```
Expand All @@ -22,10 +21,10 @@ Make sure you go through the authentication process. You'll see a popup and will
gs4_auth()
```

1. We are going to use a sheet from previous years: https://docs.google.com/spreadsheets/d/1KIRtcPVn58R3_qr97WNtcOJiY4AaytHzGDzLW_3_R1s/edit?usp=sharing. Save this link string as "sheet_url".
1. We are going to use a sheet from previous years: https://docs.google.com/spreadsheets/d/1KIRtcPVn58R3_qr97WNtcOJiY4AaytHzGDzLW_3_R1s/edit?usp=sharing

```{r}
sheet_url <- "https://docs.google.com/spreadsheets/d/1KIRtcPVn58R3_qr97WNtcOJiY4AaytHzGDzLW_3_R1s/edit?usp=sharing"
```

2. Use the `read_sheet()` function to read in the data like we discussed in class, call this object `x`.
Expand All @@ -46,13 +45,19 @@ gs4_auth()
```

5. How often do you see the word "learn" in the dataset `x`? Hint: use a `stringr` function.

```{r}
```

## JSON Lab

<!-- See here for more fun practice! https://github.com/jdorfman/Awesome-JSON-Datasets -->

The following dataset lists airports in the US and details about the number of late flights over time.

1. Read in data from the following link using the `fromJSON()` function: https://think.cs.vt.edu/corgis/datasets/json/airlines/airlines.json. Call this `jsonData`.
1. Read in data from the following link: https://think.cs.vt.edu/corgis/datasets/json/airlines/airlines.json. Call this `jsonData`.

```{r}
Expand All @@ -64,7 +69,19 @@ The following dataset lists airports in the US and details about the number of l
```

3. Filter `jsonData` to include only NYC LaGuardia and Chicago O'Hare airports (Code : "LGA", "ORD") and Year : 2016. Call this `lga_ord`.
3. Filter `jsonData` to include only Year : 2016. Call this `air_2016`.

```{r}
```

4. How many unique Airports are there?

```{r}
```

5. Filter `air_2016` to include only NYC LaGuardia and Chicago O'Hare airports (Code : "LGA", "ORD"). Call this `lga_ord`.

```{r}
Expand Down
Loading

0 comments on commit d5dc8bd

Please sign in to comment.