Skip to content

Commit

Permalink
Merge pull request #16 from 3mmaRand/babs4/drafting
Browse files Browse the repository at this point in the history
Babs4/drafting
  • Loading branch information
3mmaRand authored Jan 29, 2024
2 parents fd131f3 + df57686 commit 354678c
Show file tree
Hide file tree
Showing 7 changed files with 185 additions and 33 deletions.
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ website:
text: About
- href: r4babs4/week-1/study_before_workshop.qmd
text: Prepare!
- href: r4babs2/week-1/workshop.qmd
- href: r4babs4/week-1/workshop.qmd
text: Workshop
- href: r4babs4/week-1/study_after_workshop.qmd
text: Consolidate!
Expand Down
Binary file added r4babs4/week-1/images/future_you.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 14 additions & 11 deletions r4babs4/week-1/overview.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,37 @@ toc: true
toc-location: right
---

This week we will cover




This week the independent study to be done before the workshop is revision of some stage 1 core concepts. It covers file types, file systems, working directories, paths and RStudio Projects. You may feel completely confident with them but many students will benefit from a refresher. In the workshop we will....


Workshop: Project organisation, data with many variables and obs, getting an overview with summaries and distribution plots (that's kinda revision too but will seem different to them with more rows/cols, concept of QC, filtering rows
Consolidate: exercises on previous I think


### Learning objectives

The successful student will be able to:

-
- explain the organisation of files and directories in a file systems including root, home and working directories (revision)

- explain absolute and relative file paths (revision)

- know how to use a project-oriented workflow to organise work

-
- data with many variables and obs

-
- getting an overview with summaries and distribution plots

-
- concept of QC

-
- filtering rows

### Instructions

1. [Prepare](study_before_workshop.qmd)

i. 📖 Read
i. 📖 Read Understanding file systems (Stage 1 revision).
ii. 📖 Read RStudio projects (Stage 1 revision).


2. [Workshop](workshop.qmd)
Expand Down
21 changes: 19 additions & 2 deletions r4babs4/week-1/study_before_workshop.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,25 @@ toc: true
toc-location: right
---

1. 📖 Read [...](https://3mmarand.github.io/)
1. 📖 Read [Understanding file systems](https://3mmarand.github.io/comp4biosci/file_systems.html). Approximately 15 - 20 minutes. This is revision of some stage 1 core concepts. It covers file types, filesystems. working directories and paths. You may feel completely confident with them but many students will benefit from a refresher.

2. 📖 Read [Confidence Intervals](https://3mmarand.github.io/)
2. 📖 Read [RStudio Projects](https://3mmarand.github.io/comp4biosci/workflow_rstudio.html#rstudio-projects). Section 7.1 only. Approximately 5 - 10 minutes. This is revision but part of the assessment requires that you use an RStudio project so if you aren't sure you are using them, you might want to check.


Entirely optionally, you might want to review some other stage 1 content. You can access these through the past VLE sites but you might find it helpful to use my latest versions because I have improved them, there is no 2FA and the sites are searchable.


Stage 1

- [Data Analysis in R for Becoming a Bioscientist
1](https://3mmarand.github.io/R4BABS/r4babs1/r4babs1.html).Core
concepts about scientific computing, types of variable, the role of
variables in analysis and how to use RStudio to organise analysis
and import, summarise and plot data.

- [Data Analysis in R for Becoming a Bioscientist
2](https://3mmarand.github.io/R4BABS/r4babs2/r4babs2.html). The
logic of hypothesis testing, confidence intervals, what is meant by
a statistical model, two-sample tests and one- and two-way analysis
of variance (ANOVA).

118 changes: 107 additions & 11 deletions r4babs4/week-1/workshop.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,31 +41,132 @@ These four symbols are used at the beginning of each instruction so you know whe

![](images/do_on_internet.png) Something you should do in your browser on the internet. It may be searching for information, going to the VLE or downloading a file.

![](images/answer.png) A question for you to think about and answer. Record your answers in your script for future reference.
![](images/answer.png) A question for you to think about and answer. Record your answers for future reference.
:::

# Getting started

## Reproducibility

### Why does it matter?

![futureself, CC-BY-NC, by Julen
Colomb](images/future_you.png){fig-alt="Person working at a computer with an offstage person asking 'How is the analysis going?' The person at the computer replies 'Can't understand the date...and the data collector does not answer my emails or calls' Person offstage: 'That's terrible! So cruel! Who did collect the data? I will sack them!' Person at the computer: 'um...I did, 3 years ago.'"
width="400"}

- Five selfish reasons to work reproducibly [@markowetz2015].
Alternatively, see the very entertaining
[talk](https://youtu.be/yVT07Sukv9Q) which covers the the "Duke Scandal".

- Many high profile cases of work which did not reproduce e.g. Anil
Potti's work unravelled by @baggerly2009 in the "Duke Scandal"

- **Will** become standard in Science and publishing e.g OECD Global
Science Forum Building digital workforce capacity and skills for
data-intensive science [@oecdglobalscienceforum2020]

### How to achieve reproducibility

- Scripting

- Organisation: Project-oriented workflows with file and folder
structure, naming things

- Documentation: Comment your code.



### Project-oriented workflow

- use folders to organise your work

- you are aiming for structured, systematic and repeatable.

- inputs and outputs should be clearly identifiable from structure
and/or naming

Example

```
-- stem-cells
|__stem-cells.Rproj
|__analysis.R
|__data-raw
|__2019-03-21_donor_1.csv
|__2019-03-21_donor_2.csv
|__2019-03-21_donor_3.csv
|__figures
|__01_volcano_donor_1_vs_donor_2.png
|__02_volcano_donor_1_vs_donor_3.png
```

## Naming things

- machine readable

- human readable

- play nicely with sorting

I suggest

- no spaces in names

- use snake_case or kebab-case rather than CamelCase or dot.case

- use all lower case

- ordering: use left-padded numbers e.g., 01, 02....99 or 001,
002....999

- dates [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format:
2020-10-16

# Exercises

## Getting started


![](images/do_on_your_computer.png) Start RStudio from the Start menu.

![](images/do_in_R.png) Go the Files tab in the lower right pane and click on the `...` on the right. This will open a "Go to folder" window. Navigate to a place on your computer where you keep your work. Click Open.

![](images/do_in_R.png) Also on the Files tab click on `New Folder`. Type "data-analysis-in-r-4" in to the box. This will be the folder that we work in throughout the Data Analysis in R part BABS4.

![](images/do_in_R.png) Make an RStudio project for this workshop by clicking on the drop-down menu on top right where it says `Project: (None)` and choosing New Project, then New Directory, then New Project. Name the RStudio Project '1-core'.
![](images/do_in_R.png) Make an RStudio project for this workshop by clicking on the drop-down menu on top right where it says `Project: (None)` and choosing New Project, then New Directory, then New Project. Name the RStudio Project 'core-01'.

![](images/do_in_R.png) Make a new script then save it with a name like analysis.R to carry out the rest of the work.

![](images/do_in_R.png) Add a comment to the script: `# Core` and load the **`tidyverse`** [@tidyverse] package

![](images/do_in_R.png) Make a new folder called `data-raw`.

# Exercises

## Remind yourself how to import files!

[Importing data from files](../../r4babs1/week-8/workshop.html#importing-data-from-files) was covered in BABS 1 [@rand2023] if you need to remind yourself.
## Data with many variables and obs

examine

## Getting an overview

with summaries and distribution plots

summary
summary stats
group_by
distributions
boxplots / violin
facet
think about the number of variables and observations!!
ggpairs
summarise and plot


## Quality Control

### filtering rows
a particular value
NA
make things zero

```{r}
#| include: false
Expand All @@ -78,11 +179,6 @@ These four symbols are used at the beginning of each instruction so you know whe

<!-- #---THINKING ANSWER--- -->

<!-- the 95% confidence interval from 132.75 mu m^2 to 151.95 mu m^2 -->

<!-- doesn't include 155 mu m^2 so we can conclude that DHA deficiency significantly -->

<!-- lowers csa -->

You're finished!

Expand Down
36 changes: 36 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -140,4 +140,40 @@ @Manual{emmeans
year = {2023},
note = {R package version 1.8.7},
url = {https://CRAN.R-project.org/package=emmeans},
}

@article{baggerly2009,
title = {DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY},
author = {Baggerly, Keith A and Coombes, Kevin R},
year = {2009},
date = {2009},
journal = {Ann. Appl. Stat.},
pages = {1309--1334},
volume = {3},
number = {4},
doi = {10.2307/27801549},
url = {http://www.jstor.org/stable/27801549},
note = {Publisher: Institute of Mathematical Statistics}
}

@techreport{oecdglobalscienceforum2020,
title = {Building digital workforce capacity and skills for data-intensive science},
author = {OECD Global Science Forum, },
year = {2020},
month = {06},
date = {2020-06-19},
url = {http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=DSTI/STP/GSF(2020)6/FINAL&docLanguage=En}
}

@article{markowetz2015,
title = {Five selfish reasons to work reproducibly},
author = {Markowetz, Florian},
year = {2015},
month = {12},
date = {2015-12-08},
journal = {Genome Biol.},
pages = {274},
volume = {16},
doi = {10.1186/s13059-015-0850-7},
url = {http://dx.doi.org/10.1186/s13059-015-0850-7}
}
16 changes: 8 additions & 8 deletions update_notes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,18 @@ VLE iframe
Week 1 Data Analysis in R for BABS 2


<h2 style="color:MediumSeaGreen;">Week 9 Overview <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-1/overview.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-1/overview.html" title="Overview" allow="fullscreen" width="800" height="400"></iframe>
<h2 style="color:MediumSeaGreen;">Overview <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-6/overview.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-6/overview.html" title="Overview" allow="fullscreen" width="800" height="600"></iframe>


<h2 style="color:MediumSeaGreen;">Independent Study to do before the workshop <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-1/study_before_workshop.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-1/study_before_workshop.html" title="Prepare" allow="fullscreen" width="800" height="400"></iframe>
<h2 style="color:MediumSeaGreen;">Independent Study to do before the workshop <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-6/study_before_workshop.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-6/study_before_workshop.html" title="Prepare" allow="fullscreen" width="800" height="400"></iframe>

<h2 style="color:MediumSeaGreen;">Workshop material <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-1/workshop.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-1/workshop.html" title="Workshop" allow="fullscreen" width="800" height="400"></iframe>
<h2 style="color:MediumSeaGreen;">Workshop material <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-6/workshop.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-6/workshop.html" title="Workshop" allow="fullscreen" width="800" height="400"></iframe>

<h2 style="color:MediumSeaGreen;">Independent Study to do after the workshop <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-1/overview.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/BIO00088H-data/core/week-2/study_after_workshop.html" title="Consolidation" allow="fullscreen" width="800" height="400"></iframe>
<h2 style="color:MediumSeaGreen;">Independent Study to do after the workshop <a href="https://3mmarand.github.io/R4BABS/r4babs2/week-6/study_after_workshop.html" target="_blank">Direct link</a></h2>
<iframe src="https://3mmarand.github.io/R4BABS/r4babs2/week-6/study_after_workshop.html" title="Consolidation" allow="fullscreen" width="800" height="400"></iframe>


###############
Expand Down

0 comments on commit 354678c

Please sign in to comment.