app-import.qmd

# Data Import {#sec-data}

THIS APPENDIX IS BEING UPDATED...

## Intended Learning Outcomes {#sec-ilo-data .unnumbered}

* Be able to inspect data
* Be able to import data from a range of sources
* Be able to identify and handle common problems with data import

## Walkthrough video {#sec-walkthrough-data .unnumbered}

There is a walkthrough video of this chapter available via [Echo360.](https://echo360.org.uk/media/52d2249e-a737-42b4-bf55-267e39fc05c5/public) Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.

## Set-up {#sec-setup-data}

Create a new project for the work we'll do in this chapter named `r path("app-data-import")`. Then, create and save a new R Markdown document named `data.Rmd`, get rid of the default template text, and load the packages in the set-up code chunk. You should have all of these packages installed already, but if you get the message `Error in library(x) : there is no package called ‘x’`, please refer to @sec-install-package.

```{r setup-data, message=FALSE, verbatim="r setup, include=FALSE"}
library(tidyverse)     # includes readr & tibble
library(rio)           # for almost any data import/export
library(haven)         # for SPSS, Stata,and SAS files
library(readxl)        # for Excel files
library(googlesheets4) # for Google Sheets
```

We'd recommend making a new code chunk for each different activity, and using the white space to make notes on any errors you make, things you find interesting, or questions you'd like to ask the course team.

Download the [Data import cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf).

## Built-in data {#sec-builtin}

You'll likely want to import you own data to work with, however, Base R also comes with built-in datasets and these can be very useful for learning new functions and packages. Additionally, some packages, like <pkg>tidyr</pkg>, also contain data. The `data()` function lists the datasets available.

```{r built-in-data, eval = FALSE}
# list datasets built in to base R
data()

# lists datasets in a specific package
data(package = "tidyr")
```

Type the name of a dataset into the `r glossary("console")` to see the data. For example, type `?table1` into the console to see the dataset description for `table1`, which is a dataset included with <pkg>tidyr</pkg>.

```{r, eval = FALSE}
?table1
```

You can also use the `data()` function to load a dataset into your `r glossary("global environment")`.

```{r}
# loads table1 into the environment
data("table1")
```


## Looking at data

Now that you've loaded some data, look the upper right hand window of RStudio, under the Environment tab. You will see the object `table1` listed, along with the number of observations (rows) and variables (columns). This is your first check that everything went OK.

**Always, always, always, look at your data once you've created or loaded a table**. Also look at it after each step that transforms your table. There are three main ways to look at your table: `View()`, `print()`, `tibble::glimpse()`. 

### View() 

A familiar way to look at the table is given by `View()` (uppercase 'V'), which opens up a data table in the console pane using a viewer that looks a bit like Excel. This command can be useful in the console, but don't ever put this one in a script because it will create an annoying pop-up window when the user runs it. You can also click on an object in the  `r glossary("panes", "environment pane")` to open it in the same interface. You can close the tab when you're done looking at it; it won't remove the object.

```{r, eval = FALSE}
View(table1)
```


### print() 

The `print()` method can be run explicitly, but is more commonly called by just typing the variable name on a blank line. The default is not to print the entire table, but just the first 10 rows. 

Let's look at the `table1` table that we loaded above. Depending on how wide your screen is, you might need to click on an arrow at the right of the table to see the last column. 

```{r print, eval = FALSE}
# call print explicitly
print(table1)

# more common method of just calling object name
table1
```

```{r print2, echo = FALSE}
# more common method of just calling object name
table1
```

### glimpse() 

The function `tibble::glimpse()` gives a sideways version of the table. This is useful if the table is very wide and you can't easily see all of the columns. It also tells you the `r glossary("data type")` of each column in angled brackets after each column name. 

```{r sw_glimpse}
glimpse(table1)
```

### summary() {#sec-summary-function}

You can get a quick summary of a dataset with the `summary()` function, which can be useful for spotting things like if the minimum or maximum values are clearly wrong, or if R thinks that a `r glossary("nominal")` variable is `r glossary("numeric")`. For example, if you had labelled gender as 1, 2, and 3 rather than male, female, and non-binary, `summary()` would calculate a mean and median even though this isn't appropriate for the data. This can be a useful flag that you need to take further steps to correct your data. 

Note that because `population` is a very, very large number, R will use [scientific notation](https://courses.lumenlearning.com/waymakerintermediatealgebra/chapter/read-writing-scientific-notation-2/). 

```{r}
summary(table1)
```


## Importing data {#sec-import_data}

Built-in data are nice for examples, but you're probably more interested in your own data. There are many different types of files that you might work with when doing data analysis. These different file types are usually distinguished by the three-letter `r glossary("extension")` following a period at the end of the file name (e.g., `.xls`). 

Download this [directory of data files](data/data.zip), unzip the folder, and save the `data` directory in the `04-data` project directory.

```{r importing-data-setup, include=FALSE}
demo <- tibble(
  character = LETTERS[1:6],
  factor = factor(rep(c("high", "low", "med"), 2), 
                  levels = c("low", "med", "high")),
  integer = 1:6,
  double = 1.5:6.5,
  logical = c(T, T, F, F, NA, T),
  date = lubridate::today() - 0:5
)

export(demo, "data/demo.csv")
export(demo, "data/demo.tsv")
export(demo, "data/demo.xlsx", overwrite = TRUE)
export(demo, "data/demo.sav")
export(demo, "data/demo.json")
```


### rio::import()  

The type of data files you have to work with will likely depend on the software that you typically use in your workflow. The <pkg>rio</pkg> package has very straightforward functions for reading and saving data in most common formats: `rio::import()` and `rio::export()`. 

```{r}
demo_tsv  <- import("data/demo.tsv")  # tab-separated values
demo_csv  <- import("data/demo.csv")  # comma-separated values
demo_xls  <- import("data/demo.xlsx") # Excel format
demo_sav  <- import("data/demo.sav")  # SPSS format
```


### File type specific import 

However, it is also useful to know the specific functions that are used to import different file types because it is easier to discover features to deal with complicated cases, such as when you need to skip rows, rename columns, or choose which Excel sheet to use.

```{r, message=FALSE}
demo_tsv <- readr::read_tsv("data/demo.tsv")
demo_csv <- readr::read_csv("data/demo.csv")
demo_xls <- readxl::read_excel("data/demo.xlsx")
demo_sav <- haven::read_sav("data/demo.sav")
```

::: {.callout-note .try}
Look at the help for each function above and read through the Arguments section to see how you can customise import.
:::

If you keep data in Google Sheets, you can access it directly from R using `<pkg>googlesheets4", "https://googlesheets4.tidyverse.org/")`. The code below imports data from a [public sheet](https://docs.google.com/spreadsheets/d/16dkq0YL0J7fyAwT1pdgj1bNNrheckAU_2-DKuuM6aGI){target="_blank"}. You can set the `ss` argument to the entire `r glossary("URL")` for the target sheet, or just the section after "https://docs.google.com/spreadsheets/d/".

```{r, message = FALSE}
gs4_deauth() # skip authorisation for public data

demo_gs4  <- googlesheets4::read_sheet(
  ss = "16dkq0YL0J7fyAwT1pdgj1bNNrheckAU_2-DKuuM6aGI"
)
```


### Column data types {#sec-col_types}

Use `glimpse()` to see how these different functions imported the data with slightly different data types. This is because the different file types store data slightly differently. For example, SPSS stores factors as numbers, so the `factor` column contains the values 1, 2, 3 rather than `low`, `med`, `high`. It also stores `r glossary("logical")` values as 0 and 1 instead or TRUE and FALSE.

```{r}
glimpse(demo_csv)
```

```{r}
glimpse(demo_xls)
```

```{r}
glimpse(demo_sav)
```

```{r}
glimpse(demo_gs4)
```

The <pkg>readr</pkg> functions display a message when you import data explaining what `r glossary("data type")` each column is.

```{r}
demo <- readr::read_csv("data/demo.csv")
```

The "Column specification" tells you which `r glossary("data type")` each column is. You can review data types in @sec-data-types. Options are:

* `chr`: `r glossary("character")`
* `dbl`: `r glossary("double")`
* `lgl`: `r glossary("logical")`
* `int`: `r glossary("integer")`
* `date`: date
* `dttm`: date/time

`read_csv()` will guess what type of data each variable is and normally it is pretty good at this. However, if it makes a mistake, such as reading the "date" column as a `r glossary("character")`, you can manually set the column data types. 

First, run `spec()` on the dataset which will give you the full column specification that you can copy and paste:

```{r}
spec(demo)
```

Then, we create an object using the code we just copied that lists the correct column types. Factor columns will always import as character data types, so you have to set their data type manually with `col_factor()` and set the order of levels with the `levels` argument. Otherwise, the order defaults to the order they appear in the dataset. For our `demo` dataset, we will tell R that the `factor` variable is a factor by using `col_factor()` and we can also specify the order of the levels so that they don't just appear alphabetically. Additionally, we can also specify exactly what format our `date` variable is in using `%Y-%m-%d`.

We then save this column specification to an object, and then add this to the `col_types` argument when we call `read_csv()`.

```{r}
corrected_cols <- cols(
  character = col_character(),
  factor = col_factor(levels = c("low", "med", "high")),
  integer = col_integer(),
  double = col_double(),
  logical = col_logical(),
  date = col_date(format = "%Y-%m-%d")
)

demo <- readr::read_csv("data/demo.csv", col_types = corrected_cols)
```

::: {.callout-note}
For dates, you might need to set the format your dates are in. See `?strptime` for a list of the codes used to represent different date formats. For example, `"%d-%b-%y"` means that the dates are formatted like `31-Jan-21`. 
:::

The functions from <pkg>readxl</pkg> for loading `.xlsx` sheets have a different, more limited way to specify the column types. You will have to convert factor columns and dates using `mutate()`, which you'll learn about in @sec-wrangle, so most people let `read_excel()` guess data types and don't set the `col_types` argument.

For SPSS data, whilst `rio::import()` will just read the numeric values of factors and not their labels, the function `read_sav()` from <pkg>haven</pkg> reads both. However, you have to convert factors from a haven-specific "labelled double" to a factor (we have no idea why haven doesn't do this for you).

```{r}
demo_sav$factor <- haven::as_factor(demo_sav$factor)

glimpse(demo_sav)
```


::: {.callout-note}
The way you specify column types for <pkg>googlesheets4</pkg> is a little different from <pkg>readr</pkg>, although you can also use the shortcodes described in the help for `read_sheet()` with <pkg>readr</pkg> functions. There is currently no column specification for factors.
:::

## Creating data 

If you need to create a small data table from scratch in R, use the `tibble::tibble()` function, and type the data right in. The `tibble` package is part of the `r glossary("tidyverse")` package that we loaded at the start of this chapter. 

Let's create a small table with the names of three [Avatar](https://en.wikipedia.org/wiki/Avatar:_The_Last_Airbender) characters and their bending type. The `tibble()` function takes `r glossary("argument", "arguments")` with the names that you want your columns to have. The values are `r glossary("vector", "vectors")` that list the column values in order.

If you don't know the value for one of the cells, you can enter `r glossary("NA")`, which we have to do for Sokka because he doesn't have any bending ability. If all the values in the column are the same, you can just enter one value and it will be copied for each row.

```{r tibble-define}    
avatar <- tibble(
  name = c("Katara", "Toph", "Sokka"),
  bends = c("water", "earth", NA),
  friendly = TRUE
)

# print it
avatar
```

You can also use the `tibble::tribble()` function to create a table by row, rather than by column. You start by listing the column names, each preceded by a tilde (`~`), then you list the values for each column, row by row, separated by commas (don't forget a comma at the end of each row).

```{r tribble-define}
avatar_by_row <- tribble(
  ~name,    ~bends,  ~friendly,
  "Katara", "water", TRUE,
  "Toph",   "earth", TRUE,
  "Sokka",  NA,      TRUE
)
```

::: {.callout-note}
You don't have to line up the columns in a tribble, but it can make it easier to spot errors.
:::

You may not need to do this very often if you are primarily working with data that you import from spreadsheets, but it is useful to know how to do it anyway.

## Writing data

If you have data that you want to save, use `rio::export()`, as follows.

```{r write_csv, eval = FALSE}
export(avatar, "data/avatar.csv")
```

This will save the data in CSV format to your working directory.

Writing to Google Sheets is a little trickier (if you never use Google Sheets feel free to skip this section). Even if a Google Sheet is publicly editable, you can't add data to it without authorising your account. 

You can authorise interactively using the following code (and your own email), which will prompt you to authorise "Tidyverse API Packages" the first time you do this. If you don't tick the checkbox authorising it to "See, edit, create, and delete all your Google Sheets spreadsheets", the next steps will fail.

```{r, eval = FALSE}
# authorise your account 
# this only needs to be done once per script
gs4_auth(email = "myemail@gmail.com")

# create a new sheet
sheet_id <- gs4_create(name = "demo-file", 
                       sheets = "letters")

# define the data table to save
letter_data <- tibble(
  character = LETTERS[1:5],
  integer = 1:5,
  double = c(1.1, 2.2, 3.3, 4.4, 5.5),
  logical = c(T, F, T, F, T),
  date = lubridate::today()
)

write_sheet(data = letter_data, 
            ss = sheet_id, 
            sheet = "letters")

## append some data
new_data <- tibble(
  character = "F",
  integer = 6L,
  double = 6.6,
  logical = FALSE,
  date = lubridate::today()
)
sheet_append(data = new_data,
             ss = sheet_id,
             sheet = "letters")

# read the data
demo <- read_sheet(ss = sheet_id, sheet = "letters")
```


::: {.callout-note .try}
* Create a new table called `family` with the first name, last name, and age of your family members (biological, adopted, or chosen). 
* Save it to a CSV file called "family.csv". 
* Clear the object from your environment by restarting R or with the code `remove(family)`.
* Load the data back in and view it.

```{r, eval = FALSE, webex.hide="Solution"}
# create the table
family <- tribble(
  ~first_name, ~last_name, ~age,
  "Lisa", "DeBruine", 45,
  "Robbie", "Jones", 14
)

# save the data to CSV
export(family, "data/family.csv")

# remove the object from the environment
remove(family)

# load the data
family <- import("data/family.csv")
```
:::

We'll be working with `r glossary("tabular data")` a lot in this class, but tabular data is made up of `r glossary("vector", "vectors")`, which groups together data with the same basic `r glossary("data type")`. @sec-data-types explains some of this terminology to help you understand the functions we'll be learning to process and analyse data.


## Troubleshooting

What if you import some data and it guesses the wrong column type? The most common reason is that a numeric column has some non-numbers in it somewhere. Maybe someone wrote a note in an otherwise numeric column. Columns have to be all one data type, so if there are any characters, the whole column is converted to character strings, and numbers like `1.2` get represented as `"1.2"`, which will cause very weird errors like `"100" < "9" == TRUE`. You can catch this by using `glimpse()` to check your data.

The data directory you downloaded contains a file called "mess.csv". Let's try loading this dataset.

```{r}
mess <- rio::import("data/mess.csv")
```

When importing goes wrong, it's often easier to fix it using the  specific importing function for that file type (e.g., use `read_csv()` rather than `rio::import()`. This is because the problems tend to be specific to the file format and you can look up the help for these functions more easily. For CSV files, the import function is `readr::read_csv`.

```{r}
# lazy = FALSE loads the data right away so you can see error messages
# this default changed in late 2021 and might change back soon
mess <- read_csv("data/mess.csv", lazy = FALSE)
```

You'll get a warning about parsing issues and the data table is just a single column. View the file `data/mess.csv` by clicking on it in the File pane, and choosing "View File". Here are the first 10 lines. What went wrong?

`r head(mess)`

First, the file starts with a note: "This is my messy dataset" and then a blank line. The first line of data should be the column headings, so we want to skip the first two lines. You can do this with the argument `skip` in `read_csv()`.

```{r mess}
mess <- read_csv("data/mess.csv", 
                 skip = 2,
                 lazy = FALSE)
glimpse(mess)
```

OK, that's a little better, but this table is still a serious mess in several ways:

* `junk` is a column that we don't need
* `order` should be an integer column
* `good` should be a logical column
* `good` uses all kinds of different ways to record TRUE and FALSE values
* `min_max` contains two pieces of numeric information, but is a character column
* `date` should be a date column

We'll learn how to deal with this mess in @sec-tidy and @sec-wrangle, but we can fix a few things by setting the `col_types` argument in `read_csv()` to specify the column types for our two columns that were guessed wrong and skip the "junk" column. The argument `col_types` takes a list where the name of each item in the list is a column name and the value is from the table below. You can use the function, like `col_double()` or the abbreviation, like `"d"`; for consistency with earlier in this chapter we will use the function names. Omitted column names are guessed.

| function | |abbreviation | type |
|:---------|:--------------|:-----|
| col_logical()   | l | logical values |
| col_integer()   | i | integer values |
| col_double()    | d | numeric values |
| col_character() | c | strings |
| col_factor(levels, ordered) | f | a fixed set of values |
| col_date(format = "")     | D | with the locale's date_format |
| col_time(format = "")     | t | with the locale's time_format |
| col_datetime(format = "") | T | ISO8601 date time |
| col_number()    | n | numbers containing the grouping_mark |
| col_skip()      | _, - | don't import this column |
| col_guess()     | ? | parse using the "best" type based on the input |

```{r tidier}
# omitted values are guessed
# ?col_date for format options
ct <- cols(
  junk = col_skip(), # skip this column
  order = col_integer(),
  good = col_logical(),
  date = col_date(format = "%Y-%m-%d")
)

tidier <- read_csv("data/mess.csv", 
                   skip = 2,
                   col_types = ct,
                   lazy = FALSE)
```

You will get a message about parsing issues when you run this that tells you to run the `problems()` function to see a table of the problems. Warnings look scary at first, but always start by reading the message.

```{r, eval = FALSE}
problems()
```


```{r, echo = FALSE}
prob <- tibble(
  row = 3L,
  col = 2L, 
  expected = "an integer", 
  actual = "missing",
  file = "data/mess.csv"
)
prob
```


The output of `problems()` tells you what row (`r prob$row[[1]]`) and column (`r prob$col[[1]]`) the error was found in, what kind of data was expected (`r prob$expected[[1]]`), and what the actual value was (`r prob$actual[[1]]`). If you specifically tell `read_csv()` to import a column as an integer, any characters (i.e., not numbers) in the column will produce a warning like this and then be recorded as `NA`. You can manually set what missing values are recorded as with the `na` argument.

```{r}
tidiest <- read_csv("data/mess.csv", 
                   skip = 2,
                   na = "missing",
                   col_types = ct,
                   lazy = FALSE)
```


Now `order` is an integer variable where any empty cells contain `NA`. The variable `good` is a logical value, where `0` and `F` are converted to `FALSE`, while `1` and `T` are converted to `TRUE`. The variable `date` is a date type (adding leading zeros to the day). We'll learn in later chapters how to fix other problems, such as the `min_max` column containing two different types of data.

```{r tidiest-table, echo = FALSE}
head(tidiest)
```


## Working with real data

It's worth highlighting at this point that working with real data can be difficult because each dataset can be messy in its own way. Throughout this course we will show you common errors and how to fix them, but be prepared that when you start with working your own data, you'll likely come across problems we don't cover in the course and that's just part of joy of learning programming. You'll also get better at looking up solutions using sites like [Stack Overflow](https://stackoverflow.com/) and there's a fantastic [#rstats](https://twitter.com/hashtag/rstats) community on Twitter you can ask for help.

You may also be tempted to fix messy datasets by, for example, opening up Excel and editing them there. Whilst this might seem easier in the short term, there's two serious issues with doing this. First, you will likely work with datasets that have recurring messy problems. By taking the time to solve these problems with code, you can apply the same solutions to a large number of future datasets so it's more efficient in the long run. Second, if you edit the spreadsheet, there's no record of what you did. By solving these problems with code, you do so reproducibly and you don't edit the original data file. This means that if you make an error, you haven't lost the original data and can recover.

## Exercises

For the final step in this chapter, we will create a report using one of the in-built datasets to practice the skills you have used so far. You may need to refer back to previous chapters to help you complete these exercises and you may also want to take a break before you work through this section. We'd also recommend you knit at every step so that you can see how your output changes.

### New Markdown {#sec-exercises-new-rmd-4}

Create and save a new R Markdown document named `starwars_report.Rmd`. In the set-up code chunk load the packages `tidyverse` and `rio`.

We're going to use the built-in `starwars` dataset that contains data about Star Wars characters. You can learn more about the dataset by using the `?help` function.

### Import and export the dataset {#sec-exercises-load}

* First, load the in-built dataset into the environment. Type and run the code to do this in the console; do not save it in your Markdown.  
* Then, export the dataset to a .csv file and save it in your `data` directory. Again, do this in the console.
* Finally, import this version of the dataset using `read_csv()` to an object named `starwars` - you can put this code in your Markdown.

`r hide()`

```{r eval = TRUE}
data(starwars)
export(starwars, "data/starwars.csv")
starwars <- read_csv("data/starwars.csv")
```

`r unhide()`

### Convert column types

* Check the column specification of `starwars`.
* Create a new column specification that lists the following columns as factors: `hair_color`, `skin_color`, `eye_color`, `sex`, `gender`, `homeworld`, and `species` and skips the following columns: `films`, `vehicles`, and `starships` (this is because these columns contain multiple values and are stored as lists, which we haven't covered how to work with). You do not have to set the factor orders (although you can if you wish).
* Re-import the dataset, this time with the corrected column types.

`r hide()`

```{r eval = TRUE}
spec(starwars)
corrected_cols <- cols(
  name = col_character(),
  height = col_double(),
  mass = col_double(),
  hair_color = col_factor(),
  skin_color = col_factor(),
  eye_color = col_factor(),
  birth_year = col_double(),
  sex = col_factor(),
  gender = col_factor(),
  homeworld = col_factor(),
  species = col_factor(),
  films = col_skip(),
  vehicles = col_skip(),
  starships = col_skip()
)

starwars <- read_csv("data/starwars.csv", col_types = corrected_cols)

```

`r unhide()`

### Plots {#sec-exercises-plots}

Produce the following plots and one plot of your own choosing. Write a brief summary of what each plot shows and any conclusions you might reach from the data. 

```{r echo  = FALSE, warning = FALSE}
ggplot(starwars, aes(height)) +
  geom_histogram(binwidth = 25, colour = "black", alpha = .3) +
  labs(title = "Height (cm) distribution of Star Wars Characters") +
  theme_classic() +
  scale_x_continuous(breaks = seq(from = 50, to = 300, by = 25))
```

```{r echo = FALSE, warning = FALSE}
ggplot(starwars, aes(height, mass)) +
  geom_point() +
  labs(title = "Mass (kg) by height (cm) distribution of Star Wars Characters") +
  theme_classic() +
  scale_x_continuous(breaks = seq(from = 0, to = 300, by = 50)) +
  scale_y_continuous(breaks = seq(from = 0, to = 2000, by = 100)) +
  coord_cartesian(xlim = c(0, 300))
```

```{r echo  = FALSE}
ggplot(starwars, aes(x = gender, fill = gender)) +
  geom_bar(show.legend = FALSE, colour = "black") +
  scale_x_discrete(name = "Gender of character", labels = (c("Masculine", "Feminine", "Missing"))) +
  scale_fill_brewer(palette = 2) +
  theme_bw() +
  labs(title = "Number of Star Wars characters of each gender")
```

`r hide()`
```{r eval  = FALSE}
ggplot(starwars, aes(height)) +
  geom_histogram(binwidth = 25, colour = "black", alpha = .3) +
  scale_x_continuous(breaks = seq(from = 50, to = 300, by = 25)) +
  labs(title = "Height (cm) distribution of Star Wars Characters") +
  theme_classic()
```

```{r eval = FALSE}
ggplot(starwars, aes(height, mass)) +
  geom_point() +
  labs(title = "Mass (kg) by height (cm) distribution of Star Wars Characters") +
  theme_classic() +
  scale_x_continuous(breaks = seq(from = 0, to = 300, by = 50)) +
  scale_y_continuous(breaks = seq(from = 0, to = 2000, by = 100)) +
  coord_cartesian(xlim = c(0, 300))
```

```{r eval  = FALSE}
ggplot(starwars, aes(x = gender, fill = gender)) +
  geom_bar(show.legend = FALSE, colour = "black") +
  scale_x_discrete(name = "Gender of character", labels = (c("Masculine", "Feminine", "Missing"))) +
  scale_fill_brewer(palette = 2) +
  labs(title = "Number of Star Wars characters of each gender") +
  theme_bw()
```

`r unhide()`

### Make it look nice

* Add at least one Star Wars related image from an online source
* Hide the code and any messages from the knitted output
* Resize any images as you see fit

`r hide()`

```{r, eval = FALSE, verbatim='r, echo = FALSE, out.width = "50%", fig.cap="Adaptation of Star Wars logo created by Weweje; original logo by Suzy Rice, 1976. CC-BY-3.0"'}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/Star_wars2.svg/2880px-Star_wars2.svg.png")
```

```{r, out.width = "50%", echo = FALSE, fig.cap="Adaptation of Star Wars logo created by Weweje; original logo by Suzy Rice, 1976. CC-BY-3.0"}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/Star_wars2.svg/2880px-Star_wars2.svg.png")
```

`r unhide()`


## Glossary {#sec-glossary-data}

```{r, echo = FALSE, results='asis'}
glossary_table()
```

## Further resources {#sec-resources-data}

* [Data import cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf)
* [Chapter 11: Data Import](http://r4ds.had.co.nz/data-import.html) in *R for Data Science*
* [Multi-row headers](https://psyteachr.github.io/tutorials/multi-row-headers.html)


```{r, include = FALSE}
# clean up temp datasets
files <- c("data/avatar_na.csv", "data/family.csv", "data/starwars.csv")

file.exists(files) %>%
  `[`(files, .) %>%
  file.remove()
```