Merge branch 'main' of https://github.com/jhudsl/intro_to_r into main

jhudsl · Jan 20, 2023 · 61cb270 · 61cb270
2 parents 4271452 + ec41cc0
commit 61cb270
Show file tree

Hide file tree

Showing 32 changed files with 475 additions and 15,678 deletions.
diff --git a/index.html b/index.html
@@ -313,7 +313,7 @@ <h2>Class</h2>
 <h2>Find an Error!?</h2>
 <hr />
 <p>Feel free to submit typos/errors/etc via the GitHub repository associated with the class: <a href="https://github.com/jhudsl/intro_to_r" class="uri">https://github.com/jhudsl/intro_to_r</a></p>
-<p>This page was last updated on 2023-01-19.</p>
+<p>This page was last updated on 2023-01-20.</p>
 <p style="text-align:center;">
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://live.staticflickr.com/4557/26350808799_6f9c8bcaa2_b.jpg" height="150"/> </a>
 </p>

diff --git a/modules/Basic_R/lab/Basic_R_Lab_Key.html b/modules/Basic_R/lab/Basic_R_Lab_Key.html
@@ -308,16 +308,16 @@ <h1>Part 3</h1>
   replace = TRUE
 )
 my_responses</code></pre>
-<pre><code>##  [1] &quot;Strongly Agree&quot;    &quot;Strongly Agree&quot;    &quot;Neutral&quot;          
-##  [4] &quot;Agree&quot;             &quot;Agree&quot;             &quot;Disagree&quot;         
-##  [7] &quot;Neutral&quot;           &quot;Agree&quot;             &quot;Disagree&quot;         
-## [10] &quot;Disagree&quot;          &quot;Strongly Disagree&quot; &quot;Strongly Disagree&quot;
-## [13] &quot;Strongly Agree&quot;    &quot;Agree&quot;             &quot;Disagree&quot;         
-## [16] &quot;Neutral&quot;           &quot;Strongly Disagree&quot; &quot;Neutral&quot;          
-## [19] &quot;Strongly Agree&quot;    &quot;Neutral&quot;           &quot;Disagree&quot;         
-## [22] &quot;Agree&quot;             &quot;Strongly Disagree&quot; &quot;Disagree&quot;         
-## [25] &quot;Strongly Agree&quot;    &quot;Agree&quot;             &quot;Strongly Disagree&quot;
-## [28] &quot;Agree&quot;             &quot;Neutral&quot;           &quot;Agree&quot;</code></pre>
+<pre><code>##  [1] &quot;Disagree&quot;          &quot;Neutral&quot;           &quot;Neutral&quot;          
+##  [4] &quot;Neutral&quot;           &quot;Neutral&quot;           &quot;Disagree&quot;         
+##  [7] &quot;Neutral&quot;           &quot;Agree&quot;             &quot;Agree&quot;            
+## [10] &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot;    &quot;Agree&quot;            
+## [13] &quot;Disagree&quot;          &quot;Disagree&quot;          &quot;Strongly Agree&quot;   
+## [16] &quot;Strongly Disagree&quot; &quot;Agree&quot;             &quot;Strongly Disagree&quot;
+## [19] &quot;Agree&quot;             &quot;Disagree&quot;          &quot;Neutral&quot;          
+## [22] &quot;Strongly Disagree&quot; &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot;   
+## [25] &quot;Strongly Agree&quot;    &quot;Agree&quot;             &quot;Strongly Agree&quot;   
+## [28] &quot;Strongly Disagree&quot; &quot;Strongly Agree&quot;    &quot;Strongly Agree&quot;</code></pre>
 <p><strong>Bonus / Extra practice</strong>: Let’s say you change your survey so participants can rank their response 1-10 (inclusive). Create a randomly sampled vector of 30 survey responses. (hint use <code>seq()</code> and <code>sample()</code> and set the replace argument to <code>TRUE</code>). Store the output as <code>my_responses_2</code>. Examine the data by typing the name in the Console using a function.</p>
 <pre class="r"><code>my_responses_2 &lt;- sample(
   x = seq(from = 1, to = 10),

diff --git a/modules/Data_Cleaning/Data_Cleaning.html b/modules/Data_Cleaning/Data_Cleaning.html
@@ -206,7 +206,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 <li>The <code>lubridate</code> package is helpful for dates and times<br/>📃<a href='https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-4.pdf' title=''>Cheatsheet</a></li>
 </ul>
 
-</article></slide><slide class=""><hgroup><h2>Data Cleaning</h2></hgroup><article  class="emphasized" id="data-cleaning">
+</article></slide><slide class=""><hgroup><h2>Data Cleaning</h2></hgroup><article  id="data-cleaning" class="emphasized">
 
 <p>In general, data cleaning is a process of investigating your data for inaccuracies, or recoding it in a way that makes it more manageable.</p>
 
@@ -227,7 +227,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 <li><code>Inf</code> and <code>-Inf</code> - Infinity, happens when you divide a positive number (or negative number) by 0.</li>
 </ul>
 
-</article></slide><slide class=""><hgroup><h2>Finding Missing data</h2></hgroup><article  class="small" id="finding-missing-data">
+</article></slide><slide class=""><hgroup><h2>Finding Missing data</h2></hgroup><article  id="finding-missing-data" class="small">
 
 <ul>
 <li><code>is.na</code> - looks for <code>NAN</code> and <code>NA</code></li>
@@ -253,7 +253,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <pre >[1] FALSE FALSE  TRUE</pre>
 
-</article></slide><slide class=""><hgroup><h2>Useful checking functions</h2></hgroup><article  class="small" id="useful-checking-functions">
+</article></slide><slide class=""><hgroup><h2>Useful checking functions</h2></hgroup><article  id="useful-checking-functions" class="small">
 
 <ul>
 <li><code>any</code> will be <code>TRUE</code> if ANY are true
@@ -405,7 +405,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 <pre class = 'prettyprint lang-r'>df &lt;-tibble(Dog = c(0, NA, 2, 3, 1, 1), 
             Cat = c(NA, 8, 6, NA, 2, NA))</pre>
 
-</article></slide><slide class=""><hgroup><h2>filter() and missing data</h2></hgroup><article  class="codesmall" id="filter-and-missing-data-2">
+</article></slide><slide class=""><hgroup><h2>filter() and missing data</h2></hgroup><article  id="filter-and-missing-data-2" class="codesmall">
 
 <pre class = 'prettyprint lang-r'>df</pre>
 
@@ -471,7 +471,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 1     2     6
 2     1     2</pre>
 
-</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article  class="codesmall" id="drop-columns-with-any-missing-values">
+</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article  id="drop-columns-with-any-missing-values" class="codesmall">
 
 <p>Use the <code>miss_var_which()</code> function from <code>naniar</code></p>
 
@@ -492,7 +492,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <pre >[1] &quot;Dog&quot; &quot;Cat&quot;</pre>
 
-</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article  class="codesmall" id="drop-columns-with-any-missing-values-1">
+</article></slide><slide class=""><hgroup><h2>Drop <strong>columns</strong> with any missing values</h2></hgroup><article  id="drop-columns-with-any-missing-values-1" class="codesmall">
 
 <p><code>miss_var_which</code> and function from <code>naniar</code> (need a data frame)</p>
 
@@ -555,7 +555,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <p>⚠️ You might want to keep the <code>NA</code> values so that you know the original sample size.</p>
 
-</article></slide><slide class=""><hgroup><h2>Word of caution</h2></hgroup><article  class="codesmall" id="word-of-caution">
+</article></slide><slide class=""><hgroup><h2>Word of caution</h2></hgroup><article  id="word-of-caution" class="codesmall">
 
 <p>⚠️ Calculating percentages will give you a different result depending on your choice to include NA values.!</p>
 
@@ -1422,7 +1422,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 4     2     4     2 50%  
 5     3     3     4 100% </pre>
 
-</article></slide><slide class=""><hgroup><h2>Removing columns with threshold of percent missing values</h2></hgroup><article  class="codesmall" id="removing-columns-with-threshold-of-percent-missing-values">
+</article></slide><slide class=""><hgroup><h2>Removing columns with threshold of percent missing values</h2></hgroup><article  id="removing-columns-with-threshold-of-percent-missing-values" class="codesmall">
 
 <pre class = 'prettyprint lang-r'>is.na(df) %&gt;% head(n = 3)</pre>
 

diff --git a/modules/Data_Cleaning/Data_Cleaning.pdf b/modules/Data_Cleaning/Data_Cleaning.pdf
diff --git a/modules/Data_Input/Data_Input.pdf b/modules/Data_Input/Data_Input.pdf
diff --git a/modules/Data_Output/Data_Output.pdf b/modules/Data_Output/Data_Output.pdf
diff --git a/modules/Data_Summarization/Data_Summarization.html b/modules/Data_Summarization/Data_Summarization.html
@@ -248,7 +248,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 <pre >  0%  25%  50%  75% 100% 
  1.0  2.5  4.5  6.5  8.0 </pre>
 
-</article></slide><slide class=""><hgroup><h2>Statistical summarization</h2></hgroup><article  class="codesmall" id="statistical-summarization-2">
+</article></slide><slide class=""><hgroup><h2>Statistical summarization</h2></hgroup><article  id="statistical-summarization-2" class="codesmall">
 
 <p>We will talk more about data types later, but you can only do summarization on numeric or logical types. Not characters.</p>
 
@@ -6051,7 +6051,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 16  2014   334  15.7
 17  2015   577  15.2</pre>
 
-</article></slide><slide class=""><hgroup><h2>Counting</h2></hgroup><article  class="codesmall" id="counting-1">
+</article></slide><slide class=""><hgroup><h2>Counting</h2></hgroup><article  id="counting-1" class="codesmall">
 
 <p><code>count()</code>, <code>table()</code>, and <code>n()</code> can all give very similar information.</p>
 

diff --git a/modules/Data_Summarization/Data_Summarization.pdf b/modules/Data_Summarization/Data_Summarization.pdf
diff --git a/modules/Data_Visualization/Data_Visualization.html b/modules/Data_Visualization/Data_Visualization.html
diff --git a/modules/Data_Visualization/Data_Visualization.pdf b/modules/Data_Visualization/Data_Visualization.pdf
diff --git a/modules/Factors/Factors.html b/modules/Factors/Factors.html
@@ -244,7 +244,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 <pre >## [1] yellow red    red    blue   yellow blue  
 ## Levels: blue red yellow</pre>
 
-</article></slide><slide class=""><hgroup><h2>A Factor Example</h2></hgroup><article  id="a-factor-example" class="smaller">
+</article></slide><slide class=""><hgroup><h2>A Factor Example</h2></hgroup><article  class="smaller" id="a-factor-example">
 
 <p>We will use data on student dropouts from the State of California during the 2016-2017 school year. More on this data can be found here: <a href='https://www.cde.ca.gov/ds/ad/filesdropouts.asp' title=''>https://www.cde.ca.gov/ds/ad/filesdropouts.asp</a></p>
 
@@ -422,7 +422,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <p>Now that’s more like it! Notice how the data is automatically plotted in the order we would like.</p>
 
-</article></slide><slide class=""><hgroup><h2>What about if we <code>arrange()</code> the data by grade ?</h2></hgroup><article  id="what-about-if-we-arrange-the-data-by-grade" class="smaller">
+</article></slide><slide class=""><hgroup><h2>What about if we <code>arrange()</code> the data by grade ?</h2></hgroup><article  class="smaller" id="what-about-if-we-arrange-the-data-by-grade">
 
 <p>Character data is arranged alphabetically.</p>
 
@@ -446,7 +446,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <p>Notice that the order is not what we would hope for!</p>
 
-</article></slide><slide class=""><hgroup><h2>Arranging Factors</h2></hgroup><article  id="arranging-factors" class="smaller">
+</article></slide><slide class=""><hgroup><h2>Arranging Factors</h2></hgroup><article  class="smaller" id="arranging-factors">
 
 <p>Factor data is arranged by level.</p>
 
@@ -502,7 +502,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 ## 3 Junior                 2
 ## 4 Senior                13</pre>
 
-</article></slide><slide class=""><hgroup><h2><code>forcats</code> for ordering</h2></hgroup><article  id="forcats-for-ordering" class="smaller">
+</article></slide><slide class=""><hgroup><h2><code>forcats</code> for ordering</h2></hgroup><article  class="smaller" id="forcats-for-ordering">
 
 <p>What if we wanted to order <code>grade</code> by increasing <code>n_dropouts</code>?</p>
 
@@ -517,7 +517,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 
 <p>This would be useful for identifying easily which grade to focus on.</p>
 
-</article></slide><slide class=""><hgroup><h2>forcats for ordering</h2></hgroup><article  id="forcats-for-ordering-1" class="smaller">
+</article></slide><slide class=""><hgroup><h2>forcats for ordering</h2></hgroup><article  class="smaller" id="forcats-for-ordering-1">
 
 <p>We can order a factor by another variable by using the <code>fct_reorder()</code> function of the <code>forcats</code> package.</p>
 
@@ -552,7 +552,7 @@ <h1 data-config-title><!-- populated from slide_config.json --></h1>
 ## 5 15633216009179 Junior             0     4
 ## 6 33670330113647 Sophomore          0     0</pre>
 
-</article></slide><slide class=""><hgroup><h2>Plotting new variable</h2></hgroup><article  id="plotting-new-variable" class="smaller">
+</article></slide><slide class=""><hgroup><h2>Plotting new variable</h2></hgroup><article  class="smaller" id="plotting-new-variable">
 
 <p>Now let’s plot each of our variables of interest (n_dropouts and tardy) on the y axis and grade on the x axis. Let’s arrange grade by the amount of each.</p>
 

diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
@@ -94,9 +94,8 @@ times_2_plus_4 <- function(x) {
   return(output)
 }
 
-result <-times_2_plus_4(x = 10)
+result <- times_2_plus_4(x = 10)
 result
-
 ```
 
 
@@ -115,13 +114,13 @@ times_2_plus_y(x = 10, y = 3)
 Functions can have one returned result with multiple outputs.
 
 ```{r comment=""}
-x_and_y_plus_2<- function(x, y){
-    output1 <- x + 2
-    output2 <- y + 2
+x_and_y_plus_2 <- function(x, y) {
+  output1 <- x + 2
+  output2 <- y + 2
 
-return(c(output1,output2))
+  return(c(output1, output2))
 }
-result <-x_and_y_plus_2(x = 10, y = 3)
+result <- x_and_y_plus_2(x = 10, y = 3)
 result
 ```
 
@@ -243,14 +242,17 @@ iris %>% sapply(class)
 
 ```{r}
 select(cars, VehYear:VehicleAge) %>% head()
-select(cars, VehYear:VehicleAge) %>% sapply(times_2) %>% head()
+select(cars, VehYear:VehicleAge) %>%
+  sapply(times_2) %>%
+  head()
 ```
 
 ## Using your custom functions "on the fly" to iterate
 
 ```{r comment=""}
-select(cars, VehYear:VehicleAge) %>% 
-  sapply(function(x) x / 1000) %>% head()
+select(cars, VehYear:VehicleAge) %>%
+  sapply(function(x) x / 1000) %>%
+  head()
 ```
 # across
 
@@ -307,7 +309,7 @@ cars_dbl %>%
 Using different `tidyselect()` options:
 
 ```{r warning=FALSE}
-cars_dbl %>% 
+cars_dbl %>%
   group_by(Make) %>%
   summarize(across(.cols = starts_with("Veh"), .fns = mean))
 ```
@@ -319,9 +321,10 @@ Combining with `mutate()`: rounding to the nearest power of 10 (with negative di
 ```{r}
 cars_dbl %>%
   mutate(across(
-    .cols = starts_with("Veh"), 
-    .fns = round, 
-    digits = -3))
+    .cols = starts_with("Veh"),
+    .fns = round,
+    digits = -3
+  ))
 ```
 
 
@@ -346,19 +349,21 @@ mort %>%
 ## Use custom functions within `mutate` and `across`
 
 ```{r}
-times1000 <- function(x) x *1000
+times1000 <- function(x) x * 1000
 
 airquality %>%
   mutate(across(
     .cols = everything(),
     .fns  = times1000
-  )) %>% head(n = 2)
+  )) %>%
+  head(n = 2)
 
 airquality %>%
   mutate(across(
     .cols = everything(),
-    .fns  = function(x) x *1000
-  )) %>% head(n = 2)
+    .fns  = function(x) x * 1000
+  )) %>%
+  head(n = 2)
 ```
 
 
@@ -380,7 +385,7 @@ airquality %>% map_df(replace_na, replace = 0)
 Lists help us work with multiple data frames
 
 ```{r}
-AQ_list <- list( AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
+AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
 str(AQ_list)
 ```