Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/jhudsl/intro_to_r into main
Browse files Browse the repository at this point in the history
  • Loading branch information
jhudsl-robot committed Jan 20, 2023
2 parents 4271452 + ec41cc0 commit 61cb270
Show file tree
Hide file tree
Showing 32 changed files with 475 additions and 15,678 deletions.
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ <h2>Class</h2>
<h2>Find an Error!?</h2>
<hr />
<p>Feel free to submit typos/errors/etc via the GitHub repository associated with the class: <a href="https://github.com/jhudsl/intro_to_r" class="uri">https://github.com/jhudsl/intro_to_r</a></p>
<p>This page was last updated on 2023-01-19.</p>
<p>This page was last updated on 2023-01-20.</p>
<p style="text-align:center;">
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://live.staticflickr.com/4557/26350808799_6f9c8bcaa2_b.jpg" height="150"/> </a>
</p>
Expand Down
20 changes: 10 additions & 10 deletions modules/Basic_R/lab/Basic_R_Lab_Key.html
Original file line number Diff line number Diff line change
Expand Up @@ -308,16 +308,16 @@ <h1>Part 3</h1>
replace = TRUE
)
my_responses</code></pre>
<pre><code>## [1] &quot;Strongly Agree&quot; &quot;Strongly Agree&quot; &quot;Neutral&quot;
## [4] &quot;Agree&quot; &quot;Agree&quot; &quot;Disagree&quot;
## [7] &quot;Neutral&quot; &quot;Agree&quot; &quot;Disagree&quot;
## [10] &quot;Disagree&quot; &quot;Strongly Disagree&quot; &quot;Strongly Disagree&quot;
## [13] &quot;Strongly Agree&quot; &quot;Agree&quot; &quot;Disagree&quot;
## [16] &quot;Neutral&quot; &quot;Strongly Disagree&quot; &quot;Neutral&quot;
## [19] &quot;Strongly Agree&quot; &quot;Neutral&quot; &quot;Disagree&quot;
## [22] &quot;Agree&quot; &quot;Strongly Disagree&quot; &quot;Disagree&quot;
## [25] &quot;Strongly Agree&quot; &quot;Agree&quot; &quot;Strongly Disagree&quot;
## [28] &quot;Agree&quot; &quot;Neutral&quot; &quot;Agree&quot;</code></pre>
<pre><code>## [1] &quot;Disagree&quot; &quot;Neutral&quot; &quot;Neutral&quot;
## [4] &quot;Neutral&quot; &quot;Neutral&quot; &quot;Disagree&quot;
## [7] &quot;Neutral&quot; &quot;Agree&quot; &quot;Agree&quot;
## [10] &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot; &quot;Agree&quot;
## [13] &quot;Disagree&quot; &quot;Disagree&quot; &quot;Strongly Agree&quot;
## [16] &quot;Strongly Disagree&quot; &quot;Agree&quot; &quot;Strongly Disagree&quot;
## [19] &quot;Agree&quot; &quot;Disagree&quot; &quot;Neutral&quot;
## [22] &quot;Strongly Disagree&quot; &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot;
## [25] &quot;Strongly Agree&quot; &quot;Agree&quot; &quot;Strongly Agree&quot;
## [28] &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot; &quot;Strongly Agree&quot;</code></pre>
<p><strong>Bonus / Extra practice</strong>: Let’s say you change your survey so participants can rank their response 1-10 (inclusive). Create a randomly sampled vector of 30 survey responses. (hint use <code>seq()</code> and <code>sample()</code> and set the replace argument to <code>TRUE</code>). Store the output as <code>my_responses_2</code>. Examine the data by typing the name in the Console using a function.</p>
<pre class="r"><code>my_responses_2 &lt;- sample(
x = seq(from = 1, to = 10),
Expand Down
16 changes: 8 additions & 8 deletions modules/Data_Cleaning/Data_Cleaning.html
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
<li>The <code>lubridate</code> package is helpful for dates and times<br/>📃<a href='https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-4.pdf' title=''>Cheatsheet</a></li>
</ul>

</article></slide><slide class=""><hgroup><h2>Data Cleaning</h2></hgroup><article class="emphasized" id="data-cleaning">
</article></slide><slide class=""><hgroup><h2>Data Cleaning</h2></hgroup><article id="data-cleaning" class="emphasized">

<p>In general, data cleaning is a process of investigating your data for inaccuracies, or recoding it in a way that makes it more manageable.</p>

Expand All @@ -227,7 +227,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
<li><code>Inf</code> and <code>-Inf</code> - Infinity, happens when you divide a positive number (or negative number) by 0.</li>
</ul>

</article></slide><slide class=""><hgroup><h2>Finding Missing data</h2></hgroup><article class="small" id="finding-missing-data">
</article></slide><slide class=""><hgroup><h2>Finding Missing data</h2></hgroup><article id="finding-missing-data" class="small">

<ul>
<li><code>is.na</code> - looks for <code>NAN</code> and <code>NA</code></li>
Expand All @@ -253,7 +253,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<pre >[1] FALSE FALSE TRUE</pre>

</article></slide><slide class=""><hgroup><h2>Useful checking functions</h2></hgroup><article class="small" id="useful-checking-functions">
</article></slide><slide class=""><hgroup><h2>Useful checking functions</h2></hgroup><article id="useful-checking-functions" class="small">

<ul>
<li><code>any</code> will be <code>TRUE</code> if ANY are true
Expand Down Expand Up @@ -405,7 +405,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
<pre class = 'prettyprint lang-r'>df &lt;-tibble(Dog = c(0, NA, 2, 3, 1, 1),
Cat = c(NA, 8, 6, NA, 2, NA))</pre>

</article></slide><slide class=""><hgroup><h2>filter() and missing data</h2></hgroup><article class="codesmall" id="filter-and-missing-data-2">
</article></slide><slide class=""><hgroup><h2>filter() and missing data</h2></hgroup><article id="filter-and-missing-data-2" class="codesmall">

<pre class = 'prettyprint lang-r'>df</pre>

Expand Down Expand Up @@ -471,7 +471,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
1 2 6
2 1 2</pre>

</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article class="codesmall" id="drop-columns-with-any-missing-values">
</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article id="drop-columns-with-any-missing-values" class="codesmall">

<p>Use the <code>miss_var_which()</code> function from <code>naniar</code></p>

Expand All @@ -492,7 +492,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<pre >[1] &quot;Dog&quot; &quot;Cat&quot;</pre>

</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article class="codesmall" id="drop-columns-with-any-missing-values-1">
</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article id="drop-columns-with-any-missing-values-1" class="codesmall">

<p><code>miss_var_which</code> and function from <code>naniar</code> (need a data frame)</p>

Expand Down Expand Up @@ -555,7 +555,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<p>⚠️ You might want to keep the <code>NA</code> values so that you know the original sample size.</p>

</article></slide><slide class=""><hgroup><h2>Word of caution</h2></hgroup><article class="codesmall" id="word-of-caution">
</article></slide><slide class=""><hgroup><h2>Word of caution</h2></hgroup><article id="word-of-caution" class="codesmall">

<p>⚠️ Calculating percentages will give you a different result depending on your choice to include NA values.!</p>

Expand Down Expand Up @@ -1422,7 +1422,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
4 2 4 2 50%
5 3 3 4 100% </pre>

</article></slide><slide class=""><hgroup><h2>Removing columns with threshold of percent missing values</h2></hgroup><article class="codesmall" id="removing-columns-with-threshold-of-percent-missing-values">
</article></slide><slide class=""><hgroup><h2>Removing columns with threshold of percent missing values</h2></hgroup><article id="removing-columns-with-threshold-of-percent-missing-values" class="codesmall">

<pre class = 'prettyprint lang-r'>is.na(df) %&gt;% head(n = 3)</pre>

Expand Down
Binary file modified modules/Data_Cleaning/Data_Cleaning.pdf
Binary file not shown.
Binary file modified modules/Data_Input/Data_Input.pdf
Binary file not shown.
Binary file modified modules/Data_Output/Data_Output.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions modules/Data_Summarization/Data_Summarization.html
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
<pre > 0% 25% 50% 75% 100%
1.0 2.5 4.5 6.5 8.0 </pre>

</article></slide><slide class=""><hgroup><h2>Statistical summarization</h2></hgroup><article class="codesmall" id="statistical-summarization-2">
</article></slide><slide class=""><hgroup><h2>Statistical summarization</h2></hgroup><article id="statistical-summarization-2" class="codesmall">

<p>We will talk more about data types later, but you can only do summarization on numeric or logical types. Not characters.</p>

Expand Down Expand Up @@ -6051,7 +6051,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
16 2014 334 15.7
17 2015 577 15.2</pre>

</article></slide><slide class=""><hgroup><h2>Counting</h2></hgroup><article class="codesmall" id="counting-1">
</article></slide><slide class=""><hgroup><h2>Counting</h2></hgroup><article id="counting-1" class="codesmall">

<p><code>count()</code>, <code>table()</code>, and <code>n()</code> can all give very similar information.</p>

Expand Down
Binary file modified modules/Data_Summarization/Data_Summarization.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion modules/Data_Visualization/Data_Visualization.html

Large diffs are not rendered by default.

Binary file modified modules/Data_Visualization/Data_Visualization.pdf
Binary file not shown.
12 changes: 6 additions & 6 deletions modules/Factors/Factors.html
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
<pre >## [1] yellow red red blue yellow blue
## Levels: blue red yellow</pre>

</article></slide><slide class=""><hgroup><h2>A Factor Example</h2></hgroup><article id="a-factor-example" class="smaller">
</article></slide><slide class=""><hgroup><h2>A Factor Example</h2></hgroup><article class="smaller" id="a-factor-example">

<p>We will use data on student dropouts from the State of California during the 2016-2017 school year. More on this data can be found here: <a href='https://www.cde.ca.gov/ds/ad/filesdropouts.asp' title=''>https://www.cde.ca.gov/ds/ad/filesdropouts.asp</a></p>

Expand Down Expand Up @@ -422,7 +422,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<p>Now that’s more like it! Notice how the data is automatically plotted in the order we would like.</p>

</article></slide><slide class=""><hgroup><h2>What about if we <code>arrange()</code> the data by grade ?</h2></hgroup><article id="what-about-if-we-arrange-the-data-by-grade" class="smaller">
</article></slide><slide class=""><hgroup><h2>What about if we <code>arrange()</code> the data by grade ?</h2></hgroup><article class="smaller" id="what-about-if-we-arrange-the-data-by-grade">

<p>Character data is arranged alphabetically.</p>

Expand All @@ -446,7 +446,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<p>Notice that the order is not what we would hope for!</p>

</article></slide><slide class=""><hgroup><h2>Arranging Factors</h2></hgroup><article id="arranging-factors" class="smaller">
</article></slide><slide class=""><hgroup><h2>Arranging Factors</h2></hgroup><article class="smaller" id="arranging-factors">

<p>Factor data is arranged by level.</p>

Expand Down Expand Up @@ -502,7 +502,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
## 3 Junior 2
## 4 Senior 13</pre>

</article></slide><slide class=""><hgroup><h2><code>forcats</code> for ordering</h2></hgroup><article id="forcats-for-ordering" class="smaller">
</article></slide><slide class=""><hgroup><h2><code>forcats</code> for ordering</h2></hgroup><article class="smaller" id="forcats-for-ordering">

<p>What if we wanted to order <code>grade</code> by increasing <code>n_dropouts</code>?</p>

Expand All @@ -517,7 +517,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>

<p>This would be useful for identifying easily which grade to focus on.</p>

</article></slide><slide class=""><hgroup><h2>forcats for ordering</h2></hgroup><article id="forcats-for-ordering-1" class="smaller">
</article></slide><slide class=""><hgroup><h2>forcats for ordering</h2></hgroup><article class="smaller" id="forcats-for-ordering-1">

<p>We can order a factor by another variable by using the <code>fct_reorder()</code> function of the <code>forcats</code> package.</p>

Expand Down Expand Up @@ -552,7 +552,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
## 5 15633216009179 Junior 0 4
## 6 33670330113647 Sophomore 0 0</pre>

</article></slide><slide class=""><hgroup><h2>Plotting new variable</h2></hgroup><article id="plotting-new-variable" class="smaller">
</article></slide><slide class=""><hgroup><h2>Plotting new variable</h2></hgroup><article class="smaller" id="plotting-new-variable">

<p>Now let’s plot each of our variables of interest (n_dropouts and tardy) on the y axis and grade on the x axis. Let’s arrange grade by the amount of each.</p>

Expand Down
43 changes: 24 additions & 19 deletions modules/Functions/Functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,8 @@ times_2_plus_4 <- function(x) {
return(output)
}
result <-times_2_plus_4(x = 10)
result <- times_2_plus_4(x = 10)
result
```


Expand All @@ -115,13 +114,13 @@ times_2_plus_y(x = 10, y = 3)
Functions can have one returned result with multiple outputs.

```{r comment=""}
x_and_y_plus_2<- function(x, y){
output1 <- x + 2
output2 <- y + 2
x_and_y_plus_2 <- function(x, y) {
output1 <- x + 2
output2 <- y + 2
return(c(output1,output2))
return(c(output1, output2))
}
result <-x_and_y_plus_2(x = 10, y = 3)
result <- x_and_y_plus_2(x = 10, y = 3)
result
```

Expand Down Expand Up @@ -243,14 +242,17 @@ iris %>% sapply(class)

```{r}
select(cars, VehYear:VehicleAge) %>% head()
select(cars, VehYear:VehicleAge) %>% sapply(times_2) %>% head()
select(cars, VehYear:VehicleAge) %>%
sapply(times_2) %>%
head()
```

## Using your custom functions "on the fly" to iterate

```{r comment=""}
select(cars, VehYear:VehicleAge) %>%
sapply(function(x) x / 1000) %>% head()
select(cars, VehYear:VehicleAge) %>%
sapply(function(x) x / 1000) %>%
head()
```
# across

Expand Down Expand Up @@ -307,7 +309,7 @@ cars_dbl %>%
Using different `tidyselect()` options:

```{r warning=FALSE}
cars_dbl %>%
cars_dbl %>%
group_by(Make) %>%
summarize(across(.cols = starts_with("Veh"), .fns = mean))
```
Expand All @@ -319,9 +321,10 @@ Combining with `mutate()`: rounding to the nearest power of 10 (with negative di
```{r}
cars_dbl %>%
mutate(across(
.cols = starts_with("Veh"),
.fns = round,
digits = -3))
.cols = starts_with("Veh"),
.fns = round,
digits = -3
))
```


Expand All @@ -346,19 +349,21 @@ mort %>%
## Use custom functions within `mutate` and `across`

```{r}
times1000 <- function(x) x *1000
times1000 <- function(x) x * 1000
airquality %>%
mutate(across(
.cols = everything(),
.fns = times1000
)) %>% head(n = 2)
)) %>%
head(n = 2)
airquality %>%
mutate(across(
.cols = everything(),
.fns = function(x) x *1000
)) %>% head(n = 2)
.fns = function(x) x * 1000
)) %>%
head(n = 2)
```


Expand All @@ -380,7 +385,7 @@ airquality %>% map_df(replace_na, replace = 0)
Lists help us work with multiple data frames

```{r}
AQ_list <- list( AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
str(AQ_list)
```

Expand Down
Loading

0 comments on commit 61cb270

Please sign in to comment.