Skip to content

Commit fe54134

Browse files
authored
Update Chapter_04.md
updates
1 parent 99f4abe commit fe54134

File tree

1 file changed

+78
-31
lines changed

1 file changed

+78
-31
lines changed

docs/chapters/Chapter_04.md

Lines changed: 78 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ Each of these options can be used by adding them to the first line of your code
298298
#| echo: false, eval: true
299299
cancer_data <- cancer_data |> select(where(~ all(!is.na(.x))))
300300
head(cancer_data)
301-
```r
301+
```
302302
This will execute the code without displaying the code chunk itself in the final document but will show the output.
303303

304304
### Naming Your Code Chunk
@@ -309,7 +309,7 @@ While optional, naming your code chunks can greatly enhance the manageability an
309309
#| label: my-chunk-name
310310
cancer_data <- cancer_data |> select(where(~ all(!is.na(.x))))
311311
head(cancer_data)
312-
---
312+
```
313313

314314
To label a chunk, use the syntax `#| label: chunk-label`, ensuring each label is unique within your document. Naming a chunk allows you to reference its output elsewhere in your document, making your work more organized and navigable.
315315

@@ -448,56 +448,103 @@ format:
448448
df-print: paged
449449
editor: visual
450450
---
451+
```
451452

452-
**Figure 2**: how the Quarto HTML document head looks
453-
![Screenshot of my chart](https://github.com/elixir-europe-training/ELIXIR-TrP-LiterateProgrammingR-CodeRep/blob/main/docs/chapters/Figure_CodRep/CodeRep1.png)
454-
</details>
453+
**Figure 2**: Example of how the Quarto HTML document head looks.
454+
![Quarto HTML Document Head](https://github.com/elixir-europe-training/ELIXIR-TrP-LiterateProgrammingR-CodeRep/blob/main/docs/chapters/Figure_CodRep/CodeRep1.png)
455455

456-
## Step 1: Import Required Packages
457-
In this step, create a code chunk that imports the necessary packages: `tibble`, `dplyr`, `readr`, `ggplot2`, `caret`, `ROCR`, and `pROC`.
456+
### Step 1: Import Required Packages
458457

459-
<details>
460-
<summary><strong>Solution: Import Required Packages (Step 1)</strong></summary>
458+
In this step, we'll ensure that all necessary R packages are loaded for our analysis. These packages provide functions for data manipulation, visualization, and analysis.
461459

462-
```R
460+
**Task**: Incorporate the provided code into your Quarto document to load the necessary R packages for our analysis.
461+
462+
<details>
463+
<summary><strong>Exercise</strong></summary>
464+
<p>
465+
Import Required Packages (Step 1)
466+
467+
```r
463468
# Load the required packages
464-
# Make sure to install them first if you haven't already
465-
library(tibble) # Provides a modern, tidy alternative to data frames.
466-
library(dplyr) # Data manipulation.
467-
library(readr) # Reading CSV file data.
468-
library(ggplot2) # Plotting system.
469-
library(caret) # Machine learning.
470-
library(ROCR) # Evaluating and visualizing the performance of binary classifiers.
471-
library(pROC) # Evaluating and visualizing the performance of binary and multi-class classifiers using ROC analysis.
472-
theme_set(theme_bw(12))
469+
library(tibble) # For data frames.
470+
library(dplyr) # For data manipulation.
471+
library(readr) # For reading CSV files.
472+
library(ggplot2) # For data visualization.
473+
library(caret) # For machine learning.
474+
library(ROCR) # For ROC curves.
475+
library(pROC) # For AUC and ROC analysis.
476+
theme_set(theme_bw(12)) # Set a theme for ggplot2.
473477
knitr::opts_chunk$set(fig.align = "center")
474-
475478
```
476-
**Figure 3**: how the Quarto HTML importing document looks
477-
![Screenshot of my chart](https://github.com/elixir-europe-training/ELIXIR-TrP-LiterateProgrammingR-CodeRep/blob/main/docs/chapters/Figure_CodRep/CoderRep2.png)
479+
</p>
478480
</details>
479481

480-
## Step 2: Insert text and code
481-
Objective: Integrate the provided text and code into your Quarto document.
482+
<details>
483+
<summary><strong>Solution</strong></summary>
484+
<p>
485+
**Figure 3**: Example of importing packages in Quarto.
486+
![Importing Packages in Quarto](https://github.com/elixir-europe-training/ELIXIR-TrP-LiterateProgrammingR-CodeRep/blob/main/docs/chapters/Figure_CodRep/CoderRep2.png)
487+
</details>
488+
489+
490+
491+
### Step 2: Insert Text and Code
482492

483-
Instructions:
493+
Now, let's dive into integrating both text and code into your Quarto document to begin our analysis on the Breast Cancer Wisconsin dataset.
484494

485-
1. Incorporate the given text and code snippets into the body of your Quarto document.
486-
2. As you insert key points or emphasize specific terms, utilize Bold, Italic, or any other relevant styles to enhance readability and highlight importance.
495+
**Objective**: Seamlessly blend textual explanations with R code to analyze the dataset.
487496

497+
**Instructions**:
498+
499+
1. Add the provided text and corresponding R code snippets into the body of your Quarto document.
500+
2. Emphasize key points or terms using Markdown formatting (e.g., **bold**, *italic*).
488501

489-
**Text**: "Now we read the data, available as a local csv file in the relative path (`breast-cancer-wisconsin/`) below. We use various functions to have a glimpse of its structure and dimensions. We also change the `diagnosis` variable to a factor."
502+
**Text**: "Now we read the data, which is available as a CSV file in the relative path `breast-cancer-wisconsin/`. Using various R functions, we'll have a glimpse of its structure and dimensions. We also convert the `diagnosis` variable to a factor, facilitating further analysis."
490503

491-
**Code**:
504+
**Code**:
505+
```r
492506
cancer_data <- as_tibble(read.csv("data/breast-cancer-wisconsin.csv"))
493507
head(cancer_data)
494508
cancer_data$diagnosis <- as.factor(cancer_data$diagnosis)
495509
colnames(cancer_data)
496510
dim(cancer_data)
497511

498-
**Text**
499-
"Echoing the dimensions printed in the output above, this data frame has `r nrow(cancer_data)` rows and `r ncol(cancer_data)` columns. Except for the first two columns, the remaining columns are features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image."
512+
**Text**:
513+
"Reflecting on the dimensions displayed above, this data frame consists of `r nrow(cancer_data)` rows and `r ncol(cancer_data)` columns. Except for the first two columns, the remaining columns are features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass, describing characteristics of the cell nuclei present in the image."
514+
515+
**Follow-Up Code**:
516+
```r
517+
cancer_data <- cancer_data |> select(where(~ all(!is.na(.x))))
518+
head(cancer_data)
519+
```
520+
After removing columns with missing values, our dataset is now more streamlined for analysis. This preprocessing step is crucial for ensuring the accuracy of our subsequent analyses.
521+
522+
## Data Exploration and Analysis
500523

524+
### Exercise: Visualizing Data
525+
526+
**Objective**: Your task is to generate a scatter plot that examines the relationship between `mean_radius` and `mean_texture` of tumor cells. This visualization should help us understand if there's a visual pattern that distinguishes benign from malignant tumors based on these two features.
527+
528+
**Instructions**:
529+
530+
1. Utilize the `ggplot2` package to create a scatter plot.
531+
2. The plot should have `mean_radius` on the x-axis and `mean_texture` on the y-axis.
532+
3. Color-code the points based on the `diagnosis` to distinguish between benign and malignant tumors.
533+
534+
**Your Task**:
535+
```r
536+
# Write your ggplot2 code here to create the scatter plot
537+
538+
539+
**Code**:
540+
```r
541+
library(ggplot2)
542+
ggplot(cancer_data, aes(x = mean_radius, y = mean_texture, color = diagnosis)) +
543+
geom_point() +
544+
theme_minimal() +
545+
labs(title = "Scatter Plot of Mean Radius vs. Mean Texture",
546+
x = "Mean Radius",
547+
y = "Mean Texture")
501548
We use the following code to remove columns with missing values (`NA`), and have a glimpse of the remaining columns again.
502549

503550
**Code**

0 commit comments

Comments
 (0)