Fix typos for Starting with data carpentries-incubator#31

chiasinL · Feb 21, 2023 · 54b73c3 · 54b73c3
1 parent 934d736
commit 54b73c3
Showing 1 changed file with 25 additions and 25 deletions.
diff --git a/episodes/25-starting-with-data.Rmd b/episodes/25-starting-with-data.Rmd
@@ -40,7 +40,7 @@ intranasal route and transcriptomic changes in the cerebellum and
 spinal cord tissues were evaluated by RNA-seq at days 0
 (non-infected), 4 and 8.
 
-The dataset is stored as a comma separated value (CSV) file.  Each row
+The dataset is stored as a comma-separated values (CSV) file.  Each row
 holds information for a single RNA expression measurement, and the first eleven
 columns represent:
 
@@ -84,7 +84,7 @@ rna <- read.csv("data/rnaseq.csv")
 This statement doesn't produce any output because, as you might
 recall, assignments don't display anything. If we want to check that
 our data has been loaded, we can see the contents of the data frame by
-typing its name
+typing its name:
 
 ```{r, eval=FALSE}
 rna
@@ -142,7 +142,7 @@ columns are vectors, each column must contain a single type of data
 depicting a data frame comprising a numeric, a character, and a
 logical vector.
 
-![](./figs/data-frame.svg)
+![](./fig/data-frame.svg)
 
 We can see this when inspecting the <b>str</b>ucture of a data frame
 with the function `str()`:
@@ -160,28 +160,28 @@ content/structure of the data. Let's try them out!
 
 **Size**:
 
-- `dim(rna)` - returns a vector with the number of rows in the first
+- `dim(rna)` - returns a vector with the number of rows as the first
   element, and the number of columns as the second element (the
-  **dim**ensions of the object)
-- `nrow(rna)` - returns the number of rows
-- `ncol(rna)` - returns the number of columns
+  **dim**ensions of the object).
+- `nrow(rna)` - returns the number of rows.
+- `ncol(rna)` - returns the number of columns.
 
 **Content**:
 
-- `head(rna)` - shows the first 6 rows
-- `tail(rna)` - shows the last 6 rows
+- `head(rna)` - shows the first 6 rows.
+- `tail(rna)` - shows the last 6 rows.
 
 **Names**:
 
 - `names(rna)` - returns the column names (synonym of `colnames()` for
-  `data.frame` objects)
-- `rownames(rna)` - returns the row names
+  `data.frame` objects).
+- `rownames(rna)` - returns the row names.
 
 **Summary**:
 
 - `str(rna)` - structure of the object and information about the
-  class, length and content of each column
-- `summary(rna)` - summary statistics for each column
+  class, length and content of each column.
+- `summary(rna)` - summary statistics for each column.
 
 Note: most of these functions are "generic", they can be used on other types of
 objects besides `data.frame`.
@@ -211,9 +211,9 @@ questions?
 
 ## Indexing and subsetting data frames
 
-Our `rna` data frame has rows and columns (it has 2 dimensions), if we
+Our `rna` data frame has rows and columns (it has 2 dimensions); if we
 want to extract some specific data from it, we need to specify the
-"coordinates" we want from it. Row numbers come first, followed by
+"coordinates" we want. Row numbers come first, followed by
 column numbers. However, note that different ways of specifying these
 coordinates lead to results with different classes.
 
@@ -246,7 +246,7 @@ rna[, -1]          ## The whole data frame, except the first column
 rna[-c(7:66465), ] ## Equivalent to head(rna)
 ```
 
-Data frames can be subset by calling indices (as shown previously),
+Data frames can be subsetted by calling indices (as shown previously),
 but also by calling their column names directly:
 
 ```{r, eval=FALSE}
@@ -349,8 +349,8 @@ In R's memory, these factors are represented by integers (1, 2, 3),
 but are more informative than integers because factors are self
 describing: `"female"`, `"male"` is more descriptive than `1`,
 `2`. Which one is "male"?  You wouldn't be able to tell just from the
-integer data. Factors, on the other hand, have this information built
-in. It is particularly helpful when there are many levels (like the
+integer data. Factors, on the other hand, have this information built-in. 
+It is particularly helpful when there are many levels (like the
 gene biotype in our example dataset).
 
 When your data is stored as a factor, you can use the `plot()`
@@ -480,7 +480,7 @@ Check your guesses using `str(country_climate)`:
 - Are they what you expected?  Why? Why not?
 
 - Try again by adding `stringsAsFactors = TRUE` after the last
-  variable when creating the data frame? What is happening now?
+  variable when creating the data frame. What is happening now?
   `stringsAsFactors` can also be set when reading text-based
   spreadsheets into R using `read.csv()`.
 
@@ -524,7 +524,7 @@ tutorial](https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data
 
 ## Matrices
 
-Before proceeding, now that we have learnt about dataframes, let's
+Before proceeding, now that we have learnt about data frames, let's
 recap package installation and learn about a new data type, namely the
 `matrix`. Like a `data.frame`, a matrix has two dimensions, rows and
 columns. But the major difference is that all cells in a `matrix` must
@@ -632,11 +632,11 @@ about pitfalls of dates with spreadsheets.
 We are going to use the `ymd()` function from the package
 **`lubridate`** (which belongs to the **`tidyverse`**; learn more
 [here](https://www.tidyverse.org/)). . **`lubridate`** gets installed
-as part as the **`tidyverse`** installation. When you load the
+as part of the **`tidyverse`** installation. When you load the
 **`tidyverse`** (`library(tidyverse)`), the core packages (the
 packages used in most data analyses) get loaded. **`lubridate`**
 however does not belong to the core tidyverse, so you have to load it
-explicitly with `library(lubridate)`
+explicitly with `library(lubridate)`.
 
 Start by loading the required package:
 
@@ -710,15 +710,15 @@ order. If you have for instance day, month and year, you would need
 `dmy()`.
 
 ```{r}
-dmy(paste(x$day, x$month, x$month, sep = "-"))
+dmy(paste(x$day, x$month, x$year, sep = "-"))
 ```
 
 `lubdridate` has many functions to address all date variations.
 
 ## Summary of R objects
 
 So far, we have seen several types of R object varying in the number
-of dimensions and whether they could store a single of multiple data
+of dimensions and whether they could store a single or multiple data
 types:
 
 - **`vector`**: one dimension (they have a length), single type of data.
@@ -747,7 +747,7 @@ str(l)
 ```
 
 List subsetting is done using `[]` to subset a new sub-list or `[[]]`
-to extract a single element of that list (using indices or names, of
+to extract a single element of that list (using indices or names, if
 the list is named).
 
 ```{r}