-
Notifications
You must be signed in to change notification settings - Fork 3
2. Real example for X‐Wing
Here we provide a concrete example to run X-Wing
. We use BMI European GWAS summary statistics from UKBB (N = 359,983) and East Asian GWAS summary statistics from BBJ (N = 158,284) as the training data and 1000G as the reference panel. The goal in is to calculate the BMI PRS for the target East Asian sample.
A detailed explanation for the available flags can be found in X-Wing.
After downloading X-Wing
, you can download all the example data and required LD from reference panel using
#!/bin/bash
cd X-Wing
bash download_example.sh
#!/bin/bash
cd example
mkdir ./results/LOGODetect
Rscript /X-Wing/LOGODetect.R \
--sumstats ./data/sumstats/BMI_EUR.txt,./data/sumstats/BMI_EAS.txt \
--n_gwas 359983,158284 \
--ref_dir ./data/LOGODetect_1kg_ref \
--pop EUR,EAS \
--block_partition /X-Wing/block_partition.txt \
--gc_snp /X-Wing/1kg_hm3_snp.txt \
--out_dir ./results/LOGODetect \
--n_cores 20 \
--target_pop EAS \
--n_topregion 1000
- The first few line of GWAS summary statistics is
head ./data/sumstats/BMI_EUR.txt CHR SNP BP A1 A2 BETA P 1 rs12238997 693731 G A -0.276816514205779 0.781921 1 rs58276399 731718 C T -0.412794747579417 0.679757 1 rs61770163 732032 C A -0.056528239021207 0.954921
- The output files include the regions with significant local genetic correlation and the annotation based on local genetic correlation as shown below
head ./results/LOGODetect/LOGODetect_regions.txt chr begin_pos stop_pos stat pval qval 1 32159588 32175927 13.9007303828433 0.00479904019196161 0.0107599532725034 1 74992278 75006328 23.659452286243 0.0001999600079984 0.00107756226532471 1 107872936 107899979 13.5101005165975 0.0017996400719856 0.00492533072332902 head ./results/LOGODetect/annot_EUR.txt CHR SNP A1 A2 Anno 1 rs12238997 G A 0 1 rs58276399 C T 0 1 rs61770163 C A 0
Step 2. Calculate the SNPs posterior effect size by incorporating the portability-informed annotation into PANTHER
Here, we fit the PANTHER PRS model using European and East Asian GWAS summary statistics, and incorporate the cross-population local genetic correlation annotation derived from Step1 into PRS modeling; We specify that our target population is East Asian and use 1000G data as reference panel; We specify that our target sample plink bim
file is located in ./data/test.bim
.
#!/bin/bash
cd example
mkdir ./results/PANTHER; mkdir ./results/PANTHER/post; mkdir ./results/PANTHER/post_collect
for chr in {1..22};do
for pst_pop in {EUR EAS};do
python \
/X-Wing/PANTHER.py \
--ref_dir ./data/PANTHER_1kg_ref \
--bim_prefix ./data/test \
--sumstats ./data/sumstats/BMI_EUR.txt,./data/sumstats/BMI_EAS.txt \
--n_gwas 359983,158284 \
--anno_file ./results/LOGODetect/annot_EUR.txt,./results/LOGODetect/annot_EAS.txt \
--chrom ${chr} \
--pop EUR,EAS \
--target_pop EAS \
--pst_pop ${pst_pop} \
--out_name BMI \
--seed 3 \
--out_dir ./results/PANTHER/post
done
done
- The format of the input GWAS summary statistics is the same as LOGODetect.
- The format of the input annotation file is the same as the output annotation file for LOGODetect.
- We used for loop in this example, but in practice it can be time-consuming. We highly recommend parallel computation for the 22 autosomes and each posterior populations.
- The output file of step2 is
head ./results/PANTHER/post/BMI_EAS_pst_eff_phiauto_chr1.txt 1 rs4475691 846808 T C 7.715496e-05 1 rs7537756 854250 G A -5.351412e-05 1 rs3748592 880238 A G -2.837376e-04
where the columns in order are chromosome, SNP rs ID, BP, A1, A2, and posterior effect size estimate. Each row is a SNP.
-
PANTHER
fit the model for each chromosome separately. One can merge each output file for 22 autosomes into one file by:cat ./results/PANTHER/post/BMI_EUR_pst_eff_phiauto_chr*.txt > ./results/PANTHER/post_collect/BMI_EUR_Post.txt sed -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/PANTHER/post_collect/BMI_EUR_Post.txt cat ./results/PANTHER/post/BMI_EAS_pst_eff_phiauto_chr*.txt > ./results/PANTHER/post_collect/BMI_EAS_Post.txt sed -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/PANTHER/post_collect/BMI_EAS_Post.txt
Here, given the full GWAS summary statistics from East Asian population, we try to subsample the GWAS summary statistics for the training and validation set. We set the proportion of the training set is 0.75 and replicate the subsampling 4 times:
cd example
mkdir ./results/LEOPARD/sampled_sumstats
Rscript /X-Wing/LEOPARD_Sim.R \
--sumstats ./data/sumstats/BMI_EAS.txt \
--n_gwas 158284 \
--train_prop 0.75 \
--ref_prefix ./data/LEOPARD_1kg_ref/EAS/EAS_part1 \
--seed 42 \
--rep 4 \
--out_prefix ./results/LEOPARD/sampled_sumstats/BMI_EAS
- The format of the input GWAS summary statistics is the same as LOGODetect.
- The output for Step 3.1 contains 3 files for each replication with 4 replications in total. For example.
BMI_rep1_train.txt
,BMI_rep1_valid.txt
, andBMI_rep1_train_valid_N.txt
represent the GWAS summary statistics for the training, validation set, and the sample size for training and validation set, respectively.
Step 3.2: Fit the PRS model using sampled target population GWAS summary statistics from the training set.
Next, we fit the PANTHER PRS model using the full European GWAS summary statistics and subsampled East Asian GWAS summary statistics for training set:
cd example
mkdir ./results/LEOPARD/post; mkdir ./results/LEOPARD/post_collect
for index in {1..4}
do
for chr in {1..22};do
for pst_pop in {EUR EAS};do
python /X-Wing/PANTHER.py \
--ref_dir ./data/PANTHER_LEOPARD_1kg_ref \
--bim_prefix ./data/test \
--sumstats ./data/sumstats/BMI_EUR.txt,./results/LEOPARD/sampled_sumstats/BMI_EAS_rep${index}_train.txt \
--n_gwas 359983,118713 \
--anno_file ./results/LOGODetect/annot_EUR.txt,./results/LOGODetect/annot_EAS.txt \
--chrom ${chr} \
--pop EUR,EAS \
--target_pop EAS \
--pst_pop ${pst_pop} \
--out_name BMI_${index} \
--seed 3 \
--out_dir ./results/LEOPARD/post
done
done
done
- We used for loop in this example, but in practice it could be time-consuming. We highly recommend parallel computation for the 22 autosomes * 4 replicates * 2 populations = 176 jobs.
- Since
PANTHER
fit the model and outputs the results for each chromosome separately. We merge each output file for 22 autosomes into one file by:for index in {1..4} do cat ./results/LEOPARD/post/BMI_${index}_EUR_pst_eff_phiauto_chr*.txt > ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt sed -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt cat ./results/LEOPARD/post/BMI_${index}_EAS_pst_eff_phiauto_chr*.txt > ./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt sed -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt done
At last, we estimate the linear combination weights using the estimated SNP posterior mean effects from Step 3.2 and subsampled East Asian GWAS summary statistics for validation set:
cd example
mkdir ./results/LEOPARD/weights
# Estimating the linear combination weights
for index in {1..4}
do
Rscript /X-Wing/LEOPARD_Weights.R \
--beta_file ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt,./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt \
--valid_file ./results/LEOPARD/sampled_sumstats/BMI_EAS_rep${index}_valid.txt \
--n_valid 39571 \
--ref_prefix ./data/LEOPARD_1kg_ref/EAS/EAS_part3 \
--out ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep${index}.txt
done
-
The output of Step 3.3 is the linear combination weights for multiple population-specific PRS. Here is the result for replication 1:
head ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep1.txt Path Weights ./results/LEOPARD/Post_collect/BMI_1_EUR_Post.txt 95.3952930409908 ./results/LEOPARD/Post_collect/BMI_1_EAS_Post.txt 202.295367124474
where columns in order are the PATH to the file contains SNP posterior effects and linear combination weights. Each row is the result for each PRS included.
-
The final linear combination weights is the average of the relative weights across 4 replications. We used
LEOPARD_avg.R
to produce the it:Rscript /X-Wing/LEOPARD_Avg.R \ --weights_prefix ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep \ --rep 4 \ --out ./results/LEOPARD/weights/BMI_LEOPARD_weights_avg.txt
The results is
head ./results/LEOPARD/weights/BMI_LEOPARD_weights_avg.txt Path Weights ./results/LEOPARD/post_collect/BMI_1_EUR_Post.txt 0.411651331616073 ./results/LEOPARD/post_collect/BMI_1_EAS_Post.txt 0.588348668383927
-
Thus, the final linearly combined PRS is 0.412 x EUR_PRS + 0.588 x EAS_PRS. We recommend centering each population-specific PRS in the target dataset when calculating the final linearly combined PRS.