This GitHub page includes scripts, input data, and images associated with the practical sessions of the 2024 Physalia Course on Adaptation Genomics, given by Mafalda Ferreira and Angela Fuentes Pardo.
These materials correspond to modified versions of the original files developed (and generously shared) by Anna Tigano, Yann Dorant and Claire Mérot, which are available here.
All tutorials (except for day 1) can be completed using the files provided in this GitHub page. Therefore, each tutorial can be run independently, ensuring that everyone can start fresh every day (even if they were unable to complete a previous practical session).
Some exercises will be run using the cloud compute service AWS ("on the server"), and others will be run on your local computer. Thus, please make sure you have installed on your computer the software listed below before the course begins:
For Windows users:
A prerequisite of the course is that you are familiar with Unix and R. If you think you need a quick refresher of any of them, please take a look at the suggested readings available here.
Below you can find the proposed schedule for the week. We will maintain some flexibility in the schedule to allow enough time for questions and discussions.
Please follow the instructions shared by Carlo.
-
Data: All exercises will be based on the dataset from Cayuela et al. (2020), Molecular Ecology.
-
Genome assembly: For this course, we generated a dummy assembly of about 90 MB (instead of about 500 MB) and 5 chromosomes (instead of 24) to expedite analysis running time.
-
Raw data: Data were generated using a reduced-representation approach (GBS/RADseq) and sequenced with IonTorrent.
OBS! The analyses we will learn during the course are scalable to whole genome resequencing data or other type of genomic data.
1-1: Getting familiar with Unix environment
1-2: From raw sequences to mapped reads
1-3: Calling variants with Stacks
2-1: FST statistics with vcftools (optional: with Stacks, optional: Pairwise-FST and Isolation-by-Distance)
2-2: Principal component analysis (PCA)
2-3: Population clustering with LEA
2-4: Discriminant Analysis of Principal Components (DAPC)
- Data: We focus on 12 populations from Canada for which there is almost no geographic structure but great environmental variability.
3-1: Genetic structure and LD-pruning
3-2: Outlier of differentiation with two methods (Outflank & BayPass)
3-3: Genotype-Environnement Associations with two methods (Baypass & Redundancy Analysis)
- Data: We focus on 12 population from Canada. We recommend that you pick one of the two tutorials (haploblocks by local PCA or CNVs from RAD-seq data)
4-1: Investigating haplotypes blocks (~inversions?)
This tutorial include working on local PCA, but also calculation of LD, FST and observed fraction of heterozygotes which may be useful in other contexts.
4-2: SV calling
5-1: SnpEff annotation of SNPs for coding and regulatory regions
5-2: Intersection between SNPs and genes with bedtools
5-3: Gene ontology enrichment
5-4: (Optional) Intersection between CNVs and repeats/TE