5 scripts to perform fine-mapping :
- run fine-mapping at one specific region
finemapping/finemap_region.nf
- run fine-mapping on full summary statistics
finemapping/main.nf
- Stepwise model selection procedure to select independently associated SNPs with cojo from gcta
finemapping/cojo.nf
- Conditional analysis using gemma
finemapping/cond-assoc.nf
- Conditional analysis using gcta and summary statistics
finemapping/cond-gcta.nf
The purpose of this pipeline is to perform a initial analysis of finemapping
Our script, finemapping takes as input PLINK files, gwas file
- pipeline discarded positions duplicated from genetics file and summary statistics
The pipeline is run: nextflow run finemapping/finemap_region.nf
The key options are:
-
finemapping :
n_causal_snp
: for finemapping causal snp numberchro
: chromosome to apply chromosomebegin_seq
: begin sequence to apply sequenceend_seq
: end sequence to apply sequenceprob_cred_set
: prob of credible set [default : 0.95], for FINEMAP softwareused_pval_z
: build beta and se using z or pvale [defaul 0:no]
-
threshold_p
: treshold to used [default : 5E10-8] -
genetics data :
- 2 way are possible, data already in plink format (if possible same data that association has been done) or 1000 genome will be download
- genetics data user :
input_dir
,output_dir
: where input and output goes to and comes from;input_pat
: the base of set of PLINK bed,bim and fam files (this should only match one);
-
1000 genome :
ftp_vcf
other_cpus_req
-
file_gwas
: file contains summary stat of gwas :head_pval
: pvalue header [ default : "P_BOLT_LMM" ]head_n
: N (individuals number) [ default : None ], if not present computed with plink (and data/pheno if present)head_rs
: rs header column [default : "SNP"]head_beta
: beta header colum [default : "BETA"]head_se
: column for standard error of beta "SE"head_A1
: column for allele 1 :[default : "ALLELE1" ]head_A2
: column for allele 0 or 2 :[default : "ALLELE0" ]head_freq
: freq header [ default : ""],head_chr
: freq header [ default : ""],head_bp
: freq header [ default : ""],
-
n_pop
: number individuals for data set pop [default : 0] -
gwas_cat
: file of gwas catalog for plotheadgc_chr
: chromosome header of gwas catalogheadgc_bp
: position head of gwas cataloglist_phenogc
: from gwas catalog what pheno used (separated by a comma)file_phenogc
: from gwas catalog what pheno used inside a file (separated by a comma)
-
cojo parameter for Stepwise model selection procedure to select independently associated SNPs:
gcta_mem_req
="6GB"cojo_wind
: Specify a distance d (in Kb unit). It is assumed that SNPs more than d Kb away from each other are in complete linkage equilibrium. The default value is 10000 Kb (i.e. 10 Mb) if not specified. [ default : 10000 ] (need to be implemented)cojo_actual_geno
: If the individual-level genotype data of the discovery set are available (e.g. a single-cohort GWAS), you can use the discovery set as the reference sample. option to avoid due to a various bug [default 0]cojo_slct_other
: other option for slct see manual )(need to be implemented)
-
annotations parameter :
-
paintor_fileannot)
: file contains annotation (see paintor manual) -
binary :
finemap_bin
: software binarypaintor_bin
: software binaryplink_bin
: software binarygcta_bin
: software binary
-
mem and cpu :
max_plink_cores
LD / plink [default 4]plink_mem_req
LD / plink [default : "5GB"]gcta_mem_req
[default :"6GB"]caviar_mem_req
[default :"40GB"]paintor_mem_req
[default :"20GB"]fm_mem_req
[default : "20G"]plink_mem_req
[default :"6GB"]
need locuszoom, R : (ggplot2), python3, finemap, paintor, gcta, plink, gwama
- Data and command line can be found h3agwas-examples
nextflow run h3abioneth3agwas/finemapping/finemap_region.nf --head_pval p_wald --head_bp ps --head_chr chr --head_rs rs --head_beta beta --head_se se --head_A1 allele1 --head_A2 allele0 --list_phenogc "Type 2 diabetes" --input_dir data/imputed/ --input_pat imput_data --file_gwas data/summarystat/all_pheno.gemma --output_dir finemapping_pheno1_wind --output finemapping_pheno1 -resume -profile slurmSingularity --begin_seq 112178657 --end_seq 113178657 --chro 10
- using clump plink to defined independant locus
- for each snps apply algorithms from pipeline
finemapping/finemap_region.nf
The key options are:
- finemapping :
n_causal_snp
: for finemapping causal snp numberprob_cred_set
: prob of credible set [default : 0.95], for FINEMAP softwareused_pval_z
: build pvalue using z (need to check) [defaul 0:no]threshold_p
: treshold to used [default : 5E10-8]threshold_p2
: Secondary significance threshold for clumped SNPs see clumpsize_wind_kb
: size windows used for clump and around each independant snpsclump_r2
: ld used for clumpcut_maf
minor allele frequencies [ default : 0.01]
data
:data
: optional used individual from data to clean plink file`pheno
: phenotype associate to datacovariates
: covariate must be in data
- plink :
input_dir
,output_dir
: where input and output goes to and comes from;input_pat
: the base of set of PLINK bed,bim and fam files (this should only match one);
file_gwas
: file contains summary stat of gwas :head_pval
: pvalue header [ default : "P_BOLT_LMM" ]head_n
: N (individuals number) [ default : None ], if not present computed with plink (and data/pheno if present)head_rs
: rs header column [default : "SNP"]head_beta
: beta header colum [default : "BETA"]head_se
: column for standard error of beta "SE"head_A1
: column for allele 1 :[default : "ALLELE1" ]head_A2
: column for allele 0 or 2 :[default : "ALLELE0" ]head_freq
: freq header [ default : TODO],head_chr
: freq header [ default : TODO],head_bp
: freq header [ default : TODO],
n_pop
: number individuals for data set pop [default : 0]gwas_cat
: file of gwas catalog for plotheadgc_chr
: chromosome header of gwas catalogheadgc_bp
: position head of gwas cataloglist_phenogc
: from gwas catalog what pheno used (separated by a comma)file_phenogc
: from gwas catalog what pheno used inside a file (separated by a comma)
- cojo parameter (Stepwise model selection procedure to select independently associated SNPs.) :
gcta_mem_req
="6GB"cojo_wind
: Specify a distance d (in Kb unit). It is assumed that SNPs more than d Kb away from each other are in complete linkage equilibrium. The default value is 10000 Kb (i.e. 10 Mb) if not specified. [ default : 10000 ]cojo_actual_geno
: If the individual-level genotype data of the discovery set are available (e.g. a single-cohort GWAS), you can use the discovery set as the reference sample. option to avoid due to a various bug [default 0]cojo_slct_other
: other option for slct see manual
- mem and cpu :
max_plink_cores
LD / plink [default 4]plink_mem_req
LD / plink [default : "5GB"]gcta_mem_req
[default :"6GB"]caviar_mem_req
[default :"40GB"]paintor_mem_req
[default :"20GB"]fm_mem_req
[default : "20G"]plink_mem_req
[default :"6GB"]
- annotations parameter :
paintor_fileannot
: file contains annotation (see paintor manual)
need locuszoom, R : ggplot2, python3, finemap, paintor, gcta, plink
- $output/clump/clump_output.clumped result of clump by plink
- $output/gwascat : file of gwas catalog format
- $output/data : gene dowmloaded
- $output/fm/$chr_$bp : each positions where finemapping had been apply :
- caviarbf :result of caviarbf
- cojo_gcta : result a stepwise model selection procedure to select independently associated SNPs. showing the LD correlations between the SNPs. using gcta
- fm_cond : finemap software using option using
--cond
- fm_sss : finemap software using option
--sss
- paintor : paintor output
- $output/$chr_$bp/Finemapping_.all.out : result merge of all result
- information relative at position : rsid chromosome position allele1 allele2 rsid chromosome position allele1 allele2
- result of association : beta, se, p (if used
used_pval_z = 1
, beta and se had been computed using p-value, otherwise orignel) - gcta result, givin your independant : bj bJ_se pJ LD_r logpJ
- value of your association rsid chromosome position allele1 allele2 maf beta se
IsSig
: position significantis_cred
: is in credible interval set using finemap with--sss
cred_$chr_$bp_cond.cred
: is in credible of finemap for finemap software with option--cond
cred_$chr_$bp_sss.cred
: is in credible of finemap for finemap software with option--sss
finemap
sss result :prob_fm_sss
,log10bf_fm_sss
,mean_fm_sss
,sd_fm_sss
,mean_incl_fm_sss
,sd_incl_fm_sss
- Data and command line can be found h3agwas-examples
nextflow run h3abioneth3agwas/finemapping/main.nf --head_pval p_wald --head_bp ps --head_chr chr --head_rs rs --head_beta beta --head_se se --head_A1 allele1 --head_A2 allele0 --list_phenogc "Type 2 diabetes" --input_dir data/imputed/ --input_pat imput_data --file_gwas data/summarystat/all_pheno.gemma --output_dir finemapping_pheno1 --output finemapping_pheno1 -resume -profile slurmSingularity
this section describes a pipeline in devlopment, objectives is doing a Stepwise model selection procedure to select independently associated SNPs on full summary statistics see cojo
need python3, gcta tested for singularity image: yes
The pipeline is run: nextflow run finemapping/annot-assoc.nf
The key options are:
-
work_dir
: the directory in which you will run the workflow. This will typically be the h3agwas directory which you cloned; -
output_dir
: output directory -
output
: output pattern -
data
: same option that assoc/main.nf, file is optional, used if need select specific individus for gcta, compute frequencies or N, if mission infile_gwas
-
plink :
-
input_dir
,output_dir
: where input and output goes to and comes from; -
input_pat
: the base of set of PLINK bed,bim and fam files (this should only match one); -
pheno
: optional, header in data, if present select individuals with no missiong individual to keep individuals for computed frequencie or gcta -
cut_maf
minor allele frequencies [ default : 0.0001] -
̀
file_gwas
: file contains gwas result, if N or frequencies is not available, it is computed with plink file anddata
file, to change format header must be defined :head_pval
: pvalue header [ default : "P_BOLT_LMM" ]head_freq
: freq header [ default : None], if not present computed with plink, (and data/pheno if present)head_n
: N (individuals number) [ default : None ], if not present computed with plink (and data/pheno if present)head_rs
: rs header column [default : "SNP"]head_beta
: beta header colum [default : "BETA"]head_se
: column for standard error of beta "SE"head_A1
: column for A0 :[default : "ALLELE0" ]head_A2
: column for A1 :[default : "ALLELE1" ]
-
Cojo parameter :
cojo_wind
: Specify a distance d (in Kb unit). It is assumed that SNPs more than d Kb away from each other are in complete linkage equilibrium. The default value is 10000 Kb (i.e. 10 Mb) if not specified. [ default : 10000 ]cojo_actual_geno
: If the individual-level genotype data of the discovery set are available (e.g. a single-cohort GWAS), you can use the discovery set as the reference sample. option to avoid due to a various bug [default 0]cojo_slct
: Perform a stepwise model selection procedure to select independently associated SNPs? 1 : yes 0 : no [default 1]cojo_p
: Threshold p-value to declare a genome-wide significant hit. The default value is 5e-8 if not specified. This option is only valid in conjunction with the optioncojo_slct
.cojo_slct_other
: other option for slct see manual
cojo_top_snps_chro
: Perform a stepwise model selection procedure to select a fixed number of independently associated SNPs by chromosome without a p-value threshold. [integer between 0 and n, to define top snp number. default : 0].gcta_mem_req
="6GB"
- Data and command line can be found h3agwas-examples
nextflow run h3abionet/h3agwas/finemapping/cojo-assoc.nf --head_pval p_wald --head_bp ps --head_chr chr --head_rs rs --head_beta beta --head_se se --head_A1 allele1 --head_A2 allele0 --input_dir data/imputed/ --input_pat imput_data --output_dir cojo --data data/pheno/pheno_test.all --pheno pheno_qt1 --file_gwas data/summarystat/all_pheno.gemma -resume -profile slurmSingularity
cond-assoc.nf
need python3, gemma, R tested for singularity image: yes
- pipeline used :
- plink file and phenotype file
pos_cond
: list position to be conditionned, transform in 0 (homozygote A1), 0.5 (heterozygote) and 1 (homozygote A2) and used as covariablechro_cond
: chromosome to be conditionnedpos_ref
: position de reference where will be extracted positions of interrest
- pipeline will run :
- for each
pos_cond
, run gemma on the region using as covariablepos_cond
pos_cond
, run gemma on the region using as covariable allpos_cond
(call merge)- run gemma on the region using just phenotype
- for each
- output :
- each gemma output
- plot of LD between ref and cond
- plot of p-value comparison of initial without conditional and conditional
- data and command line can be found h3agwas-examples
- to perform a conditional analysis, using gemma where positions is used as covariable of the phenotype and check pos_ref (
--pos_ref
) is link or indepependant , argument same than gwas for gemma where you need to add :- pipeline will run a raw gwas using phenotype and covariable and after performed gwas using genotypes of each position from
pos_cond
as covariable chro_cond
: chro wherepos_cond
andpos_ref
pos_ref
: pos will be tested for independance or not topos_cond
pos_cond
: list of position as conditional to verify indpendance withpos_ref
, include as covariable
- pipeline will run a raw gwas using phenotype and covariable and after performed gwas using genotypes of each position from
- pipeline will compute ld between your positions ref and cond and produce figure
need python3, gcta, R tested for singularity image: yes
pipeline will extract positions pos_cond
, pos_ref
from summary statistics and genetics data and apply options --cojo-cond
of gcta .
- pipeline used :
- plink file, summary statistics and optional file for individual to clean
pos_cond
: list position to be conditionnedchro_cond
: chromosome to be conditionnedpos_ref
: 1 position de reference where will be extracted positions of interrestcojo_actual_geno
- plink :
input_dir
,output_dir
: where input and output goes to and comes from;input_pat
: the base of set of PLINK bed,bim and fam files (this should only match one);data
: same option that assoc/main.nf, file is optional, used if need select specific individus for gcta, compute frequencies or N, if mission infile_gwas
phenotype
pheno
: optional, header in data, if present select individuals with no missiong individual to keep individuals for computed frequencie or gctacut_maf
minor allele frequencies [ default : 0.0001]- ̀
file_gwas
: file contains gwas result, if N or frequencies is not available, it is computed with plink file anddata
file, to change format header must be defined : head_pval
: pvalue header [ default : "P_BOLT_LMM" ]head_freq
: freq header [ default : None], if not present computed with plink, (and data/pheno if present)head_n
: N (individuals number) [ default : None ], if not present computed with plink (and data/pheno if present)head_rs
: rs header column [default : "SNP"]head_beta
: beta header colum [default : "BETA"]head_se
: column for standard error of beta "SE"head_A1
: column for A0 :[default : "ALLELE0" ]head_A2
: column for A1 :[default : "ALLELE1" ]