Skip to content
gracemarionpower edited this page Apr 15, 2024 · 35 revisions

Welcome to the Lifecourse-GWAS wiki!

This wiki will guide you through the Lifecourse-GWAS consortium analysis. We have tried to write the pipeline to minimise time and energy required by analysts to contribute data to the overall effort, to ensure harmonisation across cohorts, and minimise errors.

This section provides the instructions for the data preparation and GWAS analyses for specified measures of time-varying phenotypes (BMI,..) every year up until 18 years of age and every five years after 18 years of age as a phenotype in each of the participating cohorts. The use of standardized procedures across all samples is critical in order to increase the effectiveness of the subsequent meta-analyses that we be run internally upon receipt of these GWAS. Because there is always a chance of error, we may ask some analyses to be re-run. We encourage analysts to organize and save their scripts, files, and directories just in case a re-analysis is required.

Why have we set up a Lifecourse-GWAS consortium?

Acute, chronic, and recurring, adverse health conditions that emerge in later life are often shaped by processes experienced throughout life. Gaining a better understanding of how exposures at different stages in the lifecourse influence health outcomes is key to elucidating the potential benefits of specific disease prevention and treatment strategies.

Mendelian randomisation (MR) is a technique that exploits the random assortment of genetic variants inherited from parents to offspring, independent of other traits. This reduces susceptibility to confounding factors, including confounding by undiagnosed existing disease (reverse causation). MR is increasingly being used to estimate causal effects of modifiable risk factors across the lifecourse on later life outcomes. To robustly run MR, valid instrumental variables must be employed which require large-scale datasets comprising phenotype and genotype data. Consequently, analyses are currently confined to the examination of a narrow selection of phenotypes at a few specific time periods due to data restrictions regarding the measurement of multiple phenotypes at specific time periods in most cohort studies. This consortium sets out to expand potential in this area, by aggregating these data from a wide range of cohorts. This will enable us to develop a more comprehensive set of instruments for future MR analyses to be able to estimate the effects of a range of phenotypes at multiple time periods across the lifecourse on later life outcomes.

Aim

In order to explore how selected phenotypes at different stages in the lifecourse modify risk, we will seek to combine the results of multiple genome-wide association studies of these phenotypes in meta-analyses. This will increase the probability of detection of genetic variants associated with individual differences to generate valid instrumental variables for use in MR analyses.

Setup

1. Get the code

Use git to clone the repository:

git clone https://github.com/MRCIEU/Lifecourse-GWAS.git

2. Configure the analysis

Setup your directory locations. Copy the config-template.env to a new file called config.env, then edit it to have the paths to genotype / phenotype data locations etc as required

cp config-template.env config.env

We recommend using data paths that are outside of the cloned code repository. You will see that you need the following working data directories, ideally on fast disk that can be accessed by HPC nodes.

phenotype_input_dir="/EDIT/THIS/PATH"
genotype_input_dir="/EDIT/THIS/PATH"
phenotype_processed_dir="/EDIT/THIS/PATH"
genotype_processed_dir="/EDIT/THIS/PATH"

Don't commit config.env as this will be visible to everyone otherwise.

Note that we will never request you to transfer any data from the raw individual-level data directories listed above. All the results from the pipeline will be stored in

results_dir="/EDIT/THIS/PATH"

We will only store non-disclosive summary data in here that is safe to transfer to our servers for checking and subsequent meta-analysis etc.

3. Installing R packages

In R (ideally version 4.3.2) run the following:

install.packages("renv")
renv::restore()

This will automatically install all the correctly versioned R packages required to run the pipeline.

Support

If you need support with any of the steps in this pipeline please contact:

Clone this wiki locally