diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..f7d6e2a Binary files /dev/null and b/.DS_Store differ diff --git a/config.yaml b/config.yaml index f711a90..ffbd7a6 100644 --- a/config.yaml +++ b/config.yaml @@ -11,10 +11,10 @@ carpentry: 'incubator' # Overall title for pages. -title: 'Lesson Title' # FIXME +title: 'Visualizations in R' # FIXME # Date the lesson was created (YYYY-MM-DD, this is empty by default) -created: ~ # FIXME +created: ~ '2024-07-02' # Comma-separated list of keywords for the lesson keywords: 'software, data, lesson, The Carpentries' # FIXME @@ -58,17 +58,18 @@ contact: 'team@carpentries.org' # FIXME # - another-learner.md # Order of episodes in your lesson -episodes: -- introduction.Rmd +episodes: +- high_level_plots.Rmd +- low_level_plots.Rmd # Information for Learners -learners: +learners: # Information for Instructors -instructors: +instructors: # Learner Profiles -profiles: +profiles: # Customisation --------------------------------------------- # diff --git a/episodes/.DS_Store b/episodes/.DS_Store new file mode 100644 index 0000000..c4e97ce Binary files /dev/null and b/episodes/.DS_Store differ diff --git a/episodes/fig/.DS_Store b/episodes/fig/.DS_Store new file mode 100644 index 0000000..89be50a Binary files /dev/null and b/episodes/fig/.DS_Store differ diff --git a/episodes/fig/console-completion.png b/episodes/fig/console-completion.png new file mode 100644 index 0000000..6b8a13f Binary files /dev/null and b/episodes/fig/console-completion.png differ diff --git a/episodes/high_level_plots.Rmd b/episodes/high_level_plots.Rmd new file mode 100644 index 0000000..0483feb --- /dev/null +++ b/episodes/high_level_plots.Rmd @@ -0,0 +1,151 @@ +--- +title: "High Level Plot Functions" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How do you create and customize basic plots in R using base graphics? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Understand high-level plotting functions. +- Create basic plots like histograms and barplots. + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Histogram + +A **histogram** is a plot which groups values from a single numeric variable into intervals, called **bins**, and displays the frequency (or relative frequency) of values within each bin. + +The **hist()** function creates histograms. The first argument (generically called x) specifies a numeric vector that represents the numeric variable we want to visualize. There are various optional arguments that can be used to control the details of the plot. We will first consider the hist() function with its default settings using the pre-existing data set `trees` within R. + +```{r} +hist(trees$Girth) +``` +With that, we have successfully created our first histogram by simply just specifying the data input. However, there are a few components of the plot to note: + + - By default, R chooses the number of bins (or, equivalently, the bin width) by splitting the range of the values in x into approximately $log2(n) + 1$ intervals of equal length, where n is the number of values in x. This is known as *Sturges’ rule*. This is used as a guideline, but R also attempts to put breaks at round numbers. For example, in this case, the bin width is exactly 2, which produces more bins than Sturges’ rule would produce. + + - **breaks**: The number of breaks can be (approximately) specified using the optional breaks argument. There are several ways to use breaks, such as specifying the number of bins as a single value or specifying a different rule for computing the bins (either "Freedman-Diaconis" or "Scott"). By default, breaks is set to breaks = "Sturges". + + - **freqs**: The hist() function by default outputs a frequency histogram, where the height of each bar indicates the frequency (i.e., number) of values within the corresponding bin. By setting the freq argument to FALSE, hist() will output a relative frequency histogram, where the area of each bar indicates the relative frequency (i.e., proportion) of values within the corresponding bin. + +There are more optional arguments that control how the histogram is visualized and/or constructed. If you are interested in learning more, please refer to the [histogram documentations](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/hist). + +In addition, there are components that are not specific to histograms (or the hist() function) but are components that are common to most statistical plots. The arguments to change these components are the same for nearly every high level plotting function in base graphics. The most commonly used arguments are given below. + + - **xlim**: The limits (or range) of the x-axis shown on the plot are set by default to span the range of the bins that contain all the values in x. + - **ylim**: The limits (or range) of the y-axis shown on the plot are set by default to range from 0 to the height of the highest bar(s) in the histogram. + - **xlab**: The label on the x-axis (set by default to the name of the input vector). + - **ylab**: The label on the y-axis + - **main**: The title of the histogram + - **col**: The color of the bars in the histogram + +::::::::::::::::::::::::::::::::::::: challenge + +## Challenge 1: Can you do it? + +Generate a histogram with `trees$Girth` and modify several arguments to customize the plot as follows: + + - Set the number of breaks (bins) to 10 + - Display relative frequencies instead of counts + - Set the x-axis limits to range from 5 to 25 + - Use "blue" and "goldenrod" as the colors for the bars + - Label the x-axis as "Girth of Black Cherry Trees" + - Set the title of the histogram to "Histogram of Girth of Black Cherry Trees". + + +:::::::::::::::::::::::: solution + +## Solution + +```{r} +hist( + trees$Girth, + breaks = 10, + freq = FALSE, + xlim = c(5, 25), + col = c("blue", "goldenrod"), + xlab = "Girth of Black Cherry Trees", + main = "Histogram of Girth\nof Black Cherry Trees" +) + +``` + +:::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +## Barplots + +The most common statistical plot for visualizing categorical data is the bar plot (or bar chart or bar graph), which shows a bar for each observed category. The height of the bar is proportional to the frequency of that category. + +The **barplot()** function creates bar plots. The first argument, called height, specifies the heights of the bars in the bar plot, which correspond to the frequencies (or relative frequencies) for the categorical variable(s) we want to visualize. The height argument can be a numeric vector or matrix. The type of bar plot that **barplot()** outputs depends on the class of the height object. + +### Vector Inputs +If the height argument is a numeric vector, the barplot() function will return a (simple) bar plot. We can demonstrate it again using the built-in data set `mtcars`,which contains various characteristics of 32 different car models. The `am` variable represents the type of transmission in the cars. +```{r} +# Summarize the 'am' variable +transmission_freqs <- table(mtcars$am) +barplot(transmission_freqs, + main = "Bar Plot of Transmission Types", + xlab = "Transmission (0 = Automatic, 1 = Manual)", + ylab = "Frequency", + col = c("lightblue", "lightgreen")) + +``` + +### Matrix Inputs +If the height argument is a numeric matrix, the **barplot()** function will return a stacked (or segmented) bar plot. The matrix input typically represents a two-way (contingency) table for two categorical variables. Each bar of the plot corresponds to a column of height, and the values in the column correspond to the heights of the stacked sub-bars. +```{r} +library(ggplot2) # Load ggplot2 package +data(diamonds) # Load diamonds data +diamonds_table <- with(diamonds, table(cut, color)) # Create two-way table of cut and color +barplot(diamonds_table) +``` +In addition, if we want to create a **side-by-side** barplot instead of a stacked barplot, we simply set the optional argument `beside = TRUE`. + +The differently shaded sub-bars correspond to the levels of the cut factor. To make the plot more read-able/informative, we can add a legend by setting the argument `legend = TRUE`. Notice that the same arguments that control the shading and color of histogram bars in hist() also work for barplot(). If you are interested in learning more, please refer to the [barplot documentations](https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/barplot). + +::::::::::::::::::::::::::::::::::::: challenge + +## Challenge 2: Play around on your own! + +Can you modify the previous plot to display the bars side by side instead of stacked? Add a legend and customize the colors and shading of the bars.Ensure that the legend is included and that the plot is properly labeled. Try to play around with optional arguments so that the barplot can be more informative. + +:::::::::::::::::::::::: solution + +## Solution + +```{r} +barplot(diamonds_table, + beside = TRUE, + legend = TRUE, + density = 40, + col = 1:5, + main = "Side-by-Side Distribution of Diamond Cuts by Color", + xlab = "Diamond Color", + ylab = "Frequency") +``` + +:::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::: + +Notice that histograms and barplots are not the only high-level plot functions in R. There are more functions, like boxplot and scattergraph. Please refer to specific documentations in R regarding those functions. + + +::::::::::::::::::::::::::::::::::::: keypoints + +- Know how to generate histogramsusing the hist() function +- Know how to generate barplots using the barplot() function +- Creatively use optional arguments to make high-level plots more informative + +:::::::::::::::::::::::::::::::::::::::::::::::: + +[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/episodes/introduction.Rmd b/episodes/introduction.Rmd deleted file mode 100644 index d14b25e..0000000 --- a/episodes/introduction.Rmd +++ /dev/null @@ -1,125 +0,0 @@ ---- -title: "Using RMarkdown" -teaching: 10 -exercises: 2 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How do you write a lesson using R Markdown and `{sandpaper}`? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Explain how to use markdown with the new lesson template -- Demonstrate how to include pieces of code, figures, and nested challenge blocks - -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Introduction - -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown](https://pandoc.org/MANUAL.txt) for static files and -[R Markdown][r-markdown] for dynamic files that can render code into output. -Please refer to the [Introduction to The Carpentries -Workbench](https://carpentries.github.io/sandpaper-docs/) for full documentation. - -What you need to know is that there are three sections required for a valid -Carpentries lesson template: - - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: challenge - -## Challenge 1: Can you do it? - -What is the output of this command? - -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can also include figures generated from R Markdown: - -```{r pyramid, fig.alt = "pie chart illusion of a pyramid", fig.cap = "Sun arise each and every morning"} -pie( - c(Sky = 78, "Sunny side of pyramid" = 17, "Shady side of pyramid" = 5), - init.angle = 315, - col = c("deepskyblue", "yellow", "yellow3"), - border = FALSE -) -``` - -Or you can use standard markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -::::::::::::::::::::::::::::::::::::: callout - -Callout sections can highlight information. - -They are sometimes used to emphasise particularly important points -but are also used in some lessons to present "asides": -content that is not central to the narrative of the lesson, -e.g. by providing the answer to a commonly-asked question. - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally - -:::::::::::::::::::::::::::::::::::::::::::::::: - -[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/episodes/low_level_plots.Rmd b/episodes/low_level_plots.Rmd new file mode 100644 index 0000000..14095d4 --- /dev/null +++ b/episodes/low_level_plots.Rmd @@ -0,0 +1,110 @@ +--- +title: "Low Level Plot Functions" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How do you create and customize basic plots in R using base graphics? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Understand low-level plotting functions. +- Be able to customize existing plots with low-level plotting functions. + +:::::::::::::::::::::::::::::::::::::::::::::::: + +There are numerous low level plot functions in base graphics that add components to an existing plot. + +**Caution**: Low level plot functions can only be used if there is already a plot open. If you run a low level plot command when there is not a plot already open, R will give you an error. + +## Points + +The `points()` function is used to add points to an existing plot. Similar to the basic form of `plot()`, the syntax is `points(x, y, ...)`, where `x` and `y` are numeric vectors that correspond to the coordinates of the points to add. The same optional arguments common to `plot()` can be used to specify the color, size, and type of the points. To demonstrate, we again use the built-in `trees` data set: + +```{r} +plot(Girth ~ Height, data = trees) +points(c(65, 70, 75), c(12, 17, 20), pch = 4, col = "red", cex = 1.5) +``` +The coordinate pairs can alternatively be specified using a a two-column matrix or data frame, a list with two components called `x` and `y`, or a formula `y ~ x`. + +::::::::::::::::::::::::::::::::::::: challenge + +## Challenge 1: Can you do it? + +Use the `points()`function to note which trees have an above average volume. + + +:::::::::::::::::::::::: solution + +## Solution + +```{r} +# Find the observations (trees) with an above average (mean) volume +volume_index <- with(trees, Volume > mean(Volume)) +# Plot the tree girths against height +plot(Girth ~ Height, data = trees) +# Add a blue + to the observations with an above average volume +points(Girth ~ Height, data = trees[volume_index, ], pch = "+", col = "blue", cex = 1.5) +``` + +:::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +## Lines + +The `lines()` function is used to add connected line segments to an existing plot. The syntax is identical to `points()`, but the output will connect specified coordinates by straight line segments. + +```{r} +plot(Girth ~ Height, data = trees) +# Self-generated points +coords_mat <- cbind(c(65, 70, 75), c(12, 17, 20)) +lines(coords_mat, col = "purple") +``` +**Notice:** Even though `lines()` and `points()` are separate functions, the functionality of both are actually the same. The `points()` function can be used to add line segments by setting the optional argument `type = "l"`. The `lines()` function can be used to add points by setting the optional argument `type = "p"`. + +Lines constructed in base graphics functions (like plot() or lines()) can be modified using line-specific optional arguments. Commonly used arguments are: + + - **lty**: The lty argument controls the line type. Line types can either be specified as an integer (0 is blank, 1 is solid (default), 2 is dashed, 3 is dotted, 4 is dotdash, 5 is longdash, and 6 is twodash) or as one of the character values "blank", "solid", "dashed", "dotted", "dotdash", "longdash", or "twodash", where "blank" uses “invisible lines” (i.e., the lines are not drawn). + - **lwd**: The lwd argument controls the line width. Similar to the cex argument for points, the value corresponds to the amount by which the line width should be scaled relative to the default of `lwd = 1`. Values above 1 will make the line wider, and values below 1 will make the line thinner. + +```{r} +plot(Girth ~ Height, data = trees) +lines(Girth ~ Height, data = trees[volume_index, ], col = "green", lty = 2, lwd = 3) +``` +**Notice:**The `lines()` function can also be used to add a smooth density curve over a relative frequency histogram (prob = TRUE). The `density()` function computes a kernel density estimate of the data, which can be visualized as a smooth curve superimposed over a histogram using the `lines()` function. + +::::::::::::::::::::::::::::::::::::: challenge + +## Challenge 2 + +Can you create a histogram of `Girth` with relative frequencies and then use the `lines()` function to add a smooth density curve over the histogram? + +:::::::::::::::::::::::: solution + +## Solution + +```{r} +hist(trees$Girth, prob = TRUE, main = "Histogram of Tree Girth with Density Curve", xlab = "Girth") +lines(density(trees$Girth), lwd = 2, col = "blue") +``` + +:::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +::::::::::::::::::::::::::::::::::::: keypoints + +- Know how to add points to an existing plot using the points() function +- Know how to add connected line segments to an existing plot using the lines() function +- Understand how to use the lines() function to add points by setting the type argument to "p" +- Learn how to add a smooth density curve over a histogram +:::::::::::::::::::::::::::::::::::::::::::::::: + +[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/index.md b/index.md index af66276..8936513 100644 --- a/index.md +++ b/index.md @@ -4,6 +4,8 @@ site: sandpaper::sandpaper_site This is a new lesson built with [The Carpentries Workbench][workbench]. +This lesson will give you a brief introduction of different techniques and details on how to do data visualizations in R. + [workbench]: https://carpentries.github.io/sandpaper-docs diff --git a/learners/setup.md b/learners/setup.md index 4244a31..98c970d 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -1,54 +1,29 @@ --- title: Setup --- - -FIXME: Setup instructions live in this document. Please specify the tools and -the data sets the Learner needs to have installed. - -## Data Sets - - -Download the [data zip file](https://example.com/FIXME) and unzip it to your Desktop - ## Software Setup -::::::::::::::::::::::::::::::::::::::: discussion - ### Details -Setup for different systems can be presented in dropdown menus via a `spoiler` -tag. They will join to this discussion block, so you can give a general overview -of the software used in this lesson here and fill out the individual operating -systems (and potentially add more, e.g. online setup) in the solutions blocks. - -::::::::::::::::::::::::::::::::::::::::::::::::::: - -:::::::::::::::: spoiler +To ensure you have the proper environment for creating and customizing graphics in R, it's essential to set up RStudio correctly. Begin by downloading and installing R from the Comprehensive R Archive Network (CRAN) website, followed by RStudio from the official RStudio website. Choose a CRAN mirror closest to your location and download the appropriate version of R for your operating system (Windows, macOS, or Linux), and follow the installation instructions to complete the setup. -### Windows +Once installed, launch RStudio and install essential packages through the console by running `install.packages(c("ggplot2", "dplyr"))`. -Use PuTTY +:::::::::::::::::: discussion +### R Console +This should appear on the left corner on your screen inside RStudio. +![](fig/console-completion.png){alt="Console"} -:::::::::::::::::::::::: +:::::::::::::::::: -:::::::::::::::: spoiler +Set your working directory using `setwd("path/to/your/directory")`, and create a new script file for your projects. Load the necessary libraries at the beginning of your script with `library(ggplot2)` and `library(dplyr)`. -### MacOS +Now, inside this newly created R script, we can begin our lesson. -Use Terminal.app -:::::::::::::::::::::::: -:::::::::::::::: spoiler -### Linux -Use Terminal -::::::::::::::::::::::::