-
Notifications
You must be signed in to change notification settings - Fork 1
/
syllabus.Rmd
256 lines (173 loc) · 17 KB
/
syllabus.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "Syllabus"
description: |
STAT 540: Statistical Methods for High Dimensional Biology
output:
distill::distill_article:
toc: true
toc_depth: 2
---
2023-2024 Winter Term 2 (January 8, 2024 - April 12, 2024)
STAT 540 is a 3 credit course with a mandatory computing seminar
Cross-listed as STAT 540, BIOF 540, GSAT 540
## Land acknowledgement
We respectfully recognize that the University of British Columbia Vancouver campus is located on the traditional, ancestral, and unceded territory of the xʷməθkʷəy̓əm (Musqueam) people. Please take a moment to learn about the territory you are occupying by visiting [this interactive indigenous land map](https://native-land.ca/).
## Course-level learning objectives
* Perform exploratory data analysis and visualize genomics data
* Apply tailored statistical methods to answer questions using high dimensional biological data
* Make your work reproducible, reusable, and shareable
* Work with real data in a collaborative model
## Selected topics
* Basics of molecular genetics/genomic and data collection assays (methods)
* Basic probability and math foundations
* Exploratory data analysis and data quality control
* Normalization, batch correction
* Causal inference and confounding effects
* Basic statistical inference (“one gene at a time”) – linear models
* Large-scale inference (“genome-wide”) – multiple testing
* Analysis of microarray, RNASeq, and epigenetics data
* Principal Component Analysis and clustering (unsupervised machine learning)
* Classification and cross validation (supervised machine learning)
* Gene set analysis and gene networks
* Genome-wide association analysis (GWAS)
* Single-cell genomics
## Teaching Team
For more info on the Teaching team, including brief bios, see the 'People' pages on this website (linked below). Links for connecting to recurring virtual office hours will be shared via Canvas.
### Instructors
* [Keegan Korthauer, PhD](about_keegan.html) (She/Her/Hers)
Email: <keegan@stat.ubc.ca>
Virtual office hours: By appointment
* [Yongjin Park, PhD](about_yongjin.html) (He/Him/His)
Email: <ypp@stat.ubc.ca>
Virtual office hours: By appointment
### Teaching Assistants
* [Asfar Lathif](about_asfar.html) (He/Him/His)
Email: <asfar.lathif@ubc.ca>
Virtual office hours: Tuesday 4-5pm (PST)
* [Ishika Luthra](about_ishkika.html)
Email: <ishika.luthra@ubc.ca>
Virtual office hours: Friday 10-11am (PST)
## Schedule
### Lectures (Sec 201)
- Time: Tues Thurs 9:00 - 10:30am
- Location: Building and room number listed on Canvas
- See [Lectures](lectures.html) for lecture materials and schedule
### Seminars (Sec S2B)
- Time: Tues 12:30pm - 1:30pm
- Location: Building and room number listed on Canvas
- See [Seminars](seminars.html) for schedule and seminar materials
## Course communication
#### Announcements
Course announcements will be posted on Canvas. You are responsible for checking it regularly.
#### General questions
Piazza (in Canvas) for posting questions (e.g. topics discussed in class, questions about course organization, assignment clarifications, if you are stuck on an assignment and need help). This ensures the message can be seen by the entire teaching team, and that others in the class who might have the same question can learn from responses. You are also welcome to share your input on questions posted by others.
#### Private matters
For private matters (e.g. requesting an extension, scheduling appointment for office hours), please contact the Teaching team by email (listed above). Do not use email to ask general questions described above.
#### Group work
In your final project groups, we expect you to (1) arrange regular meetings either in person or virtually and (2) make use of your team's Piazza group.
## Prerequisites and Resources
This interdisciplinary course requires a broad range of skills at the interface of statistics, molecular biology / genomics, and computing. While you may have some background in one or more of the following areas, you are not expected to be an expert in all. That said, to be successful in the course, you may need to spend a bit more time in the areas that you have less experience in. Although there are no official prerequisites for the course, here is a list of useful skills to bring into the course and/or learn along the way.
### Statistics:
- You should have already taken a university level introductory statistics course.
- [This free online book "Modern Statistics for Modern Biology" by Susan Holmes and Wolfgang Huber](http://web.stanford.edu/class/bios221/book/) is a great reference for introduction or review of many of the statistical concepts that are relevant for this course.
- [This free online book "Data Analysis for the Life Sciences" by Rafael Irizarry and Michael Love](http://genomicsclass.github.io/book/) is another great resource for introduction or review of many of the statistical concepts relevant in this course, with an emphasis on implementation in R.
### Biology:
- No requirements, but you are expected to learn things like, e.g. the difference between DNA and RNA, and the difference between a gene and a genome.
- See [this video](https://www.youtube.com/watch?v=lSqUDu4zb5k) and this [slideset](https://github.com/STAT540-UBC/resources/blob/main/biology-intro-2017.pdf) for some basic introductory material.
- Go through the (optional, not for a grade) molecular biology quiz on Canvas
- [This free online book "Concepts of Biology" by Fowler, Roush & Wise](https://openstax.org/books/concepts-biology/pages/1-introduction) is a great resource for biological concepts, in particular chapters 6 and 9
- [This free online book "Biology" by Clark, Douglas & Choi](https://openstax.org/books/biology-2e/pages/1-introduction) goes more in-depth, see Chapters 14, 15, and 16 for material on genetics that is particularly relevant for this course.
- No matter your prior experience, when you come across a new biological concept during the course or in your research, you might need to spend a bit of time 'learning as you go' - this is expected and something I still do regularly in my day-to-day research!
### R:
- No experience required but be prepared to do a lot of self-guided learning if you haven't taken other courses on R or used it in your research.
- Start now by installing [R](https://cran.r-project.org/) and the HIGHLY RECOMMENDED "integrated development environment" (IDE) [RStudio](https://rstudio.com/products/rstudio/download/) - both are free and open source.
- You should be able to run R on your own computer during each seminar session.
- If you are new to R, check out [this blog post on getting started with R](http://santina.me/Get-started-with-R/).
- [This free online book "Introduction to Data Science" by Rafael Irizarry](https://rafalab.github.io/dsbook/) is also a great resource for getting more in-depth with R, programming basics, and the tidyverse. In particular see Chapters 1-5:
- [Chapter 1: Getting Started with R and R Studio](https://rafalab.github.io/dsbook/getting-started.html)
- [Chapter 2: R Basics](https://rafalab.github.io/dsbook/r-basics.html)
- [Chapter 3: Programming Basics](https://rafalab.github.io/dsbook/programming-basics.html)
- [Chapter 4: The tidyverse](https://rafalab.github.io/dsbook/tidyverse.html)
- [Chapter 5: Importing data](https://rafalab.github.io/dsbook/importing-data.html)
### Git/GitHub and R Markdown:
- In this course we'll be using the version control software [Git](https://en.wikipedia.org/wiki/Git) and its web-based hosting and collaborative platform [GitHub](https://en.wikipedia.org/wiki/GitHub).
- [The online resource "Happy Git and GitHub for the useR" from Jenny Bryan](https://happygitwithr.com/) is a great reference for these tools as we learn them.
- Another helpful git resource is [Hadley Wickham's webinar "Collaboration and time travel- version control with git, github and RStudio"](https://rstudio.com/resources/webinars/collaboration-and-time-travel-version-control-with-git-github-and-rstudio/)
- We'll learn about using [R markdown](https://rmarkdown.rstudio.com/) to generate readable and reproducible reports with code and text, and you'll be using that a lot in this course - see [Chapter 18 of the 'Happy Git' resource: "Test drive R markdown"](https://happygitwithr.com/rmd-test-drive.html).
## Evaluation
You will have three individual assignments, six seminar submissions (one divided into two parts), and one group project. <font color="red">**Deadlines are all by 11:59 pm (Pacific time) on the due date. Any submission or modification after the due date will not be graded unless you have requested an extension**</font>. If you anticipate having trouble meeting a deadline and need to request an extension/academic concession please reach out in advance via email to the instructors.
**For more detail on each of these assignments, see the [assignments page](assignments.html)** (the header of each assignment on this page points to the relevant section of the assignments page). Also refer to this [visual overview of the timeline](timeline.html).
**For detailed instructions on how to work and submit assignments for this course, please see the [Submission Guide](submission_guide.html).**
### [Intro Assignment (5%)](assignments.html#intro-assignment-5)
- An introductory assignment designed to assess basic knowledge of GitHub, R and Rmarkdown
```{r, echo = FALSE, warning = FALSE}
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(knitr))
assn <- read.csv("course-admin/assn_2024.csv")
assn <- assn %>%
select(content,start,Category) %>%
rename(Assignment = content, `Due Date` = start) %>%
mutate(`Due Date` = as.Date(`Due Date`, "%Y-%m-%d")) %>%
arrange(`Due Date`) %>%
mutate(`Due Date` = format(`Due Date`, "%a %d %B %Y"))
kable(assn %>% filter(Assignment == "Intro Assignment") %>% select(-Category))
```
### [Seminar completion (20%)](assignments.html#seminar-completion-20)
- You will submit short "deliverables" for each seminar session
- Each seminar session is weighted equally, but the lowest score will be dropped (so that the 10 seminars with highest score will each count for 2% of the final grade).
- These deliverables give practical experience applying the knowledge that will be helpful on the homework assignment, final project, and (hopefully) your future research.
- Each deliverable is due on the Friday following the TA-led session for that seminar
```{r, echo = FALSE, warning = FALSE}
kable(assn %>% filter(Category == "Seminar") %>% select(-Category))
```
### [Paper critique (5%)](assignments.html#paper-critique-5)
See [here](critique.html) for detailed instructions and rubric.
```{r, echo = FALSE, warning = FALSE}
kable(assn %>% filter(Category == "Paper Critique") %>% select(-Category))
```
### [Analysis assignment (30%)](assignments.html#analysis-assignment-30)
- Involves detailed analysis of real data using R
- This assignment will assess your ability to understand and apply the methods learned in class
```{r, echo = FALSE, warning = FALSE}
kable(assn %>% filter(Category == "Individual Assignment", Assignment != "Intro Assignment") %>% select(-Category))
```
### [Group project (40%)](assignments.html#final-group-project-40)
- A semester-long data analysis group project where you will apply the techniques covered in class to a research question of your choosing
- Groups of target size of 4 students will be formed at the beginning of the course
- Important checkpoints during the term (with deliverables):
- Project Proposal Lightning Talks
- Written Project Proposal
- Progress Report
- Oral presentation
- Written Report & GitHub repository
- Individual and Group Evaluation
- For more details on the project components and how groups are selected see the [assignments page](assignments.html#final-group-project-40)
- For detailed grading rubrics of the final project components, see the [final project rubric page](group_project_rubrics.html) and the [final project presentation rubric](course-admin/ProjectPresentationRubric.pdf)
```{r, echo = FALSE, warning = FALSE}
kable(assn %>% filter(Category == "Group Project") %>% select(-Category))
```
## Academic Concession {#concession}
If you anticipate having trouble meeting a deadline and need an [academic concession](https://students.ubc.ca/enrolment/academic-learning-resources/academic-concessions), please reach out in advance via email to the instructors. [Here](course-admin/concession_template.pdf) is a template you can use for a self-declaration.
If you miss class, we suggest you to:
- Consult the class resources on the course website
- Use the class Piazza discussion board to discuss missed material
- Come to virtual office hours
- Seek academic concessions, if applicable
## Academic Integrity
Not only is academic integrity is essential to the successful functioning of our course, but adopting best practices will benefit you in your research practice. Make sure you understand UBC’s definitions of academic misconduct and its consequences. Policy guidelines can be found [here](http://www.calendar.ubc.ca/Vancouver/index.cfm?tree=3,54,111,958).
What does academic integrity look like in this course?
* **Do your own work**. All individual work that you submit should be completed by you and submitted by you. Do not receive or share completed coursework with students who take the course in another term.
* **Acknowledge others’ ideas**. Scholars build on the work of others, and give credit accordingly. This refers to both outside sources (e.g. from the literature, or AI tools), and inside sources (e.g from your peers).
* **Learn to avoid unintentional plagiarism.** Visit the [Learning Commons’ guide to academic integrity](http://learningcommons.ubc.ca/resource-guides/avoiding-plagiarism/) to help you organize your writing as well as understand how to prevent unintentional plagiarism.
At any time: if you are unsure if a certain type of assistance is authorized, please ask us.
## Privacy
This course requires students to work on github.com. Please be advised that the material and information you put on GitHub will be stored on servers outside of Canada. Data used for these tools may not be protected, as they have not gone through a Privacy Impact Assessment (PIA) and been identified as FIPPA compliant. When you access GitHub, you will be required to create an account. While this tool has a [privacy policy](https://docs.github.com/en/site-policy/privacy-policies/github-privacy-statement), UBC cannot guarantee security of your private details on servers outside of Canada. Please exercise caution whenever providing personal information. Please feel free to contact UBC (<access.and.privacy@ubc.ca>) or the support team for GitHub if you have any questions.
## Learning Environment & Support
We strive to provide a learning environment where all students can succeed. Please join me in contributing to a classroom culture where everyone feels valued. If you encounter an issue that presents a barrier to your learning, please reach out to us. You can also contact the Ombudsperson for help: https://ombudsoffice.ubc.ca. If you have a documented disability that affects your learning, you may contact the UBC Centre for Acessibility: https://students.ubc.ca/about-student-services/centre-for-accessibility, and contact us as soon as possible if you think you may require accommodation options for course work. Use of class time
Discussions during our scheduled class time are intended to relate to the learning outcomes for the course.
Your mental health and wellbeing impacts your academic performance. Sometimes it is possible to manage challenges on your own, while other times you may need support. UBC is committed to providing student mental health and wellbeing resources, strategies, and services that help you achieve your goals. Visit https://students.ubc.ca/health for more information.
## Health and Safety in the Classroom
Please follow the current [UBC Communicable Disease Prevention Guidelines](https://srs.ubc.ca/health-safety/safety-programs/communicable-disease-prevention-framework/) regarding self-monitoring, and staying home if you are sick. Although masks are no longer required on campus, please respect the choices of your fellow students and the instructional team who may continue to wear masks.
## Extreme Environmental Conditions Contingency Plan
In-person, on campus activities may need to be cancelled due to issues such as weather conditions (e.g., snow). The most up-to-date information about cancellations will be posted on [ubc.ca](https://www.ubc.ca/). Please check [ubc.ca](https://www.ubc.ca/) often during times when an extreme weather event could disrupt our course activities. Here is what you can expect in the event an in-person lecture or lab session is cancelled:
Depending on the nature of the planned in-class activities, class may take place over Zoom (in this case, Zoom link will be posted on Canvas), or an alternate activity may be posted on Canvas for you to complete before the next scheduled class. We will communicate via Canvas to announce the specifics for each case that arises as soon as we can.