Skip to content

Latest commit

 

History

History
119 lines (71 loc) · 5.52 KB

README.md

File metadata and controls

119 lines (71 loc) · 5.52 KB

2024 Physalia Adaptation Genomics Course

Welcome 👋

This GitHub page includes scripts, input data, and images associated with the practical sessions of the 2024 Physalia Course on Adaptation Genomics, given by Mafalda Ferreira and Angela Fuentes Pardo.

These materials correspond to modified versions of the original files developed (and generously shared) by Anna Tigano, Yann Dorant and Claire Mérot, which are available here.

All tutorials (except for day 1) can be completed using the files provided in this GitHub page. Therefore, each tutorial can be run independently, ensuring that everyone can start fresh every day (even if they were unable to complete a previous practical session).

Table of contents

Before the course

Install required software

Some exercises will be run using the cloud compute service AWS ("on the server"), and others will be run on your local computer. Thus, please make sure you have installed on your computer the software listed below before the course begins:

For Windows users:

(Optional) Refresher on Unix and R

A prerequisite of the course is that you are familiar with Unix and R. If you think you need a quick refresher of any of them, please take a look at the suggested readings available here.

During the course

Schedule

Below you can find the proposed schedule for the week. We will maintain some flexibility in the schedule to allow enough time for questions and discussions.

schedule

Log in to the AWS server from your computer

Please follow the instructions shared by Carlo.

Tutorials

Visual overview

workflow

Day 1: Handling NGS data, from raw reads to SNPs matrix

  • Data: All exercises will be based on the dataset from Cayuela et al. (2020), Molecular Ecology.

  • Genome assembly: For this course, we generated a dummy assembly of about 90 MB (instead of about 500 MB) and 5 chromosomes (instead of 24) to expedite analysis running time.

  • Raw data: Data were generated using a reduced-representation approach (GBS/RADseq) and sequenced with IonTorrent.

OBS! The analyses we will learn during the course are scalable to whole genome resequencing data or other type of genomic data.

1-1: Getting familiar with Unix environment

1-2: From raw sequences to mapped reads

1-3: Calling variants with Stacks

Day 2: Population structure and confounding factors

2-1: FST statistics with vcftools (optional: with Stacks, optional: Pairwise-FST and Isolation-by-Distance)

2-2: Principal component analysis (PCA)

2-3: Population clustering with LEA

2-4: Discriminant Analysis of Principal Components (DAPC)

Day 3: Outlier detection and Genome-by-Environment associations

  • Data: We focus on 12 populations from Canada for which there is almost no geographic structure but great environmental variability.

3-1: Genetic structure and LD-pruning

3-2: Outlier of differentiation with two methods (Outflank & BayPass)

3-3: Genotype-Environnement Associations with two methods (Baypass & Redundancy Analysis)

Day 4: Accounting for Structural Variants

  • Data: We focus on 12 population from Canada. We recommend that you pick one of the two tutorials (haploblocks by local PCA or CNVs from RAD-seq data)

4-1: Investigating haplotypes blocks (~inversions?)

This tutorial include working on local PCA, but also calculation of LD, FST and observed fraction of heterozygotes which may be useful in other contexts.

4-2: SV calling

Day 5: Functional approaches

5-1: SnpEff annotation of SNPs for coding and regulatory regions

5-2: Intersection between SNPs and genes with bedtools

5-3: Gene ontology enrichment

5-4: (Optional) Intersection between CNVs and repeats/TE

Additional resources

Cheat sheet of basic Unix commands.

Cheat sheet of basic R commands.