Skip to content

Commit

Permalink
Fix typos for Starting with data carpentries-incubator#31
Browse files Browse the repository at this point in the history
  • Loading branch information
chiasinL committed Feb 21, 2023
1 parent 934d736 commit 54b73c3
Showing 1 changed file with 25 additions and 25 deletions.
50 changes: 25 additions & 25 deletions episodes/25-starting-with-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ intranasal route and transcriptomic changes in the cerebellum and
spinal cord tissues were evaluated by RNA-seq at days 0
(non-infected), 4 and 8.

The dataset is stored as a comma separated value (CSV) file. Each row
The dataset is stored as a comma-separated values (CSV) file. Each row
holds information for a single RNA expression measurement, and the first eleven
columns represent:

Expand Down Expand Up @@ -84,7 +84,7 @@ rna <- read.csv("data/rnaseq.csv")
This statement doesn't produce any output because, as you might
recall, assignments don't display anything. If we want to check that
our data has been loaded, we can see the contents of the data frame by
typing its name
typing its name:

```{r, eval=FALSE}
rna
Expand Down Expand Up @@ -142,7 +142,7 @@ columns are vectors, each column must contain a single type of data
depicting a data frame comprising a numeric, a character, and a
logical vector.

![](./figs/data-frame.svg)
![](./fig/data-frame.svg)

We can see this when inspecting the <b>str</b>ucture of a data frame
with the function `str()`:
Expand All @@ -160,28 +160,28 @@ content/structure of the data. Let's try them out!

**Size**:

- `dim(rna)` - returns a vector with the number of rows in the first
- `dim(rna)` - returns a vector with the number of rows as the first
element, and the number of columns as the second element (the
**dim**ensions of the object)
- `nrow(rna)` - returns the number of rows
- `ncol(rna)` - returns the number of columns
**dim**ensions of the object).
- `nrow(rna)` - returns the number of rows.
- `ncol(rna)` - returns the number of columns.

**Content**:

- `head(rna)` - shows the first 6 rows
- `tail(rna)` - shows the last 6 rows
- `head(rna)` - shows the first 6 rows.
- `tail(rna)` - shows the last 6 rows.

**Names**:

- `names(rna)` - returns the column names (synonym of `colnames()` for
`data.frame` objects)
- `rownames(rna)` - returns the row names
`data.frame` objects).
- `rownames(rna)` - returns the row names.

**Summary**:

- `str(rna)` - structure of the object and information about the
class, length and content of each column
- `summary(rna)` - summary statistics for each column
class, length and content of each column.
- `summary(rna)` - summary statistics for each column.

Note: most of these functions are "generic", they can be used on other types of
objects besides `data.frame`.
Expand Down Expand Up @@ -211,9 +211,9 @@ questions?

## Indexing and subsetting data frames

Our `rna` data frame has rows and columns (it has 2 dimensions), if we
Our `rna` data frame has rows and columns (it has 2 dimensions); if we
want to extract some specific data from it, we need to specify the
"coordinates" we want from it. Row numbers come first, followed by
"coordinates" we want. Row numbers come first, followed by
column numbers. However, note that different ways of specifying these
coordinates lead to results with different classes.

Expand Down Expand Up @@ -246,7 +246,7 @@ rna[, -1] ## The whole data frame, except the first column
rna[-c(7:66465), ] ## Equivalent to head(rna)
```

Data frames can be subset by calling indices (as shown previously),
Data frames can be subsetted by calling indices (as shown previously),
but also by calling their column names directly:

```{r, eval=FALSE}
Expand Down Expand Up @@ -349,8 +349,8 @@ In R's memory, these factors are represented by integers (1, 2, 3),
but are more informative than integers because factors are self
describing: `"female"`, `"male"` is more descriptive than `1`,
`2`. Which one is "male"? You wouldn't be able to tell just from the
integer data. Factors, on the other hand, have this information built
in. It is particularly helpful when there are many levels (like the
integer data. Factors, on the other hand, have this information built-in.
It is particularly helpful when there are many levels (like the
gene biotype in our example dataset).

When your data is stored as a factor, you can use the `plot()`
Expand Down Expand Up @@ -480,7 +480,7 @@ Check your guesses using `str(country_climate)`:
- Are they what you expected? Why? Why not?

- Try again by adding `stringsAsFactors = TRUE` after the last
variable when creating the data frame? What is happening now?
variable when creating the data frame. What is happening now?
`stringsAsFactors` can also be set when reading text-based
spreadsheets into R using `read.csv()`.

Expand Down Expand Up @@ -524,7 +524,7 @@ tutorial](https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data

## Matrices

Before proceeding, now that we have learnt about dataframes, let's
Before proceeding, now that we have learnt about data frames, let's
recap package installation and learn about a new data type, namely the
`matrix`. Like a `data.frame`, a matrix has two dimensions, rows and
columns. But the major difference is that all cells in a `matrix` must
Expand Down Expand Up @@ -632,11 +632,11 @@ about pitfalls of dates with spreadsheets.
We are going to use the `ymd()` function from the package
**`lubridate`** (which belongs to the **`tidyverse`**; learn more
[here](https://www.tidyverse.org/)). . **`lubridate`** gets installed
as part as the **`tidyverse`** installation. When you load the
as part of the **`tidyverse`** installation. When you load the
**`tidyverse`** (`library(tidyverse)`), the core packages (the
packages used in most data analyses) get loaded. **`lubridate`**
however does not belong to the core tidyverse, so you have to load it
explicitly with `library(lubridate)`
explicitly with `library(lubridate)`.

Start by loading the required package:

Expand Down Expand Up @@ -710,15 +710,15 @@ order. If you have for instance day, month and year, you would need
`dmy()`.

```{r}
dmy(paste(x$day, x$month, x$month, sep = "-"))
dmy(paste(x$day, x$month, x$year, sep = "-"))
```

`lubdridate` has many functions to address all date variations.

## Summary of R objects

So far, we have seen several types of R object varying in the number
of dimensions and whether they could store a single of multiple data
of dimensions and whether they could store a single or multiple data
types:

- **`vector`**: one dimension (they have a length), single type of data.
Expand Down Expand Up @@ -747,7 +747,7 @@ str(l)
```

List subsetting is done using `[]` to subset a new sub-list or `[[]]`
to extract a single element of that list (using indices or names, of
to extract a single element of that list (using indices or names, if
the list is named).

```{r}
Expand Down

0 comments on commit 54b73c3

Please sign in to comment.