2. Real example for X‐Wing

Here we provide a concrete example to run X-Wing. We use BMI European GWAS summary statistics from UKBB (N = 359,983) and East Asian GWAS summary statistics from BBJ (N = 158,284) as the training data and 1000G as the reference panel. The goal in is to calculate the BMI PRS for the target East Asian sample.

A detailed explanation for the available flags can be found in X-Wing.

Step 0: Download the required data and LD from reference panel for the example.

After downloading X-Wing, you can download all the example data and required LD from reference panel using

#!/bin/bash
cd X-Wing
bash download_example.sh

Step 1. Obtain the annotation based on cross-population local genetic correlation using `LOGODetect`

Applying LOGODetect

#!/bin/bash

cd example

mkdir ./results/LOGODetect

Rscript /X-Wing/LOGODetect.R \
--sumstats ./data/sumstats/BMI_EUR.txt,./data/sumstats/BMI_EAS.txt \
--n_gwas 359983,158284 \
--ref_dir ./data/LOGODetect_1kg_ref \
--pop EUR,EAS \
--block_partition /X-Wing/block_partition.txt \
--gc_snp /X-Wing/1kg_hm3_snp.txt \
--out_dir ./results/LOGODetect \
--n_cores 20 \
--target_pop EAS \
--n_topregion 1000

The first few line of GWAS summary statistics is

head ./data/sumstats/BMI_EUR.txt

CHR	  SNP	          BP	       A1	A2	BETA	                P
1	  rs12238997	  693731	G	A	-0.276816514205779	0.781921
1	  rs58276399	  731718	C	T	-0.412794747579417	0.679757
1	  rs61770163	  732032	C	A	-0.056528239021207	0.954921

The output files include the regions with significant local genetic correlation and the annotation based on local genetic correlation as shown below

head ./results/LOGODetect/LOGODetect_regions.txt

chr    begin_pos    stop_pos     stat                pval                   qval
1      32159588     32175927     13.9007303828433    0.00479904019196161    0.0107599532725034
1      74992278     75006328     23.659452286243     0.0001999600079984     0.00107756226532471
1      107872936    107899979    13.5101005165975    0.0017996400719856     0.00492533072332902

head ./results/LOGODetect/annot_EUR.txt

CHR    SNP           A1    A2    Anno
1      rs12238997    G     A     0
1      rs58276399    C     T     0
1      rs61770163    C     A     0

Step 2. Calculate the SNPs posterior effect size by incorporating the portability-informed annotation into `PANTHER`

Fit PANTHER PRS model

Here, we fit the PANTHER PRS model using European and East Asian GWAS summary statistics, and incorporate the cross-population local genetic correlation annotation derived from Step1 into PRS modeling; We specify that our target population is East Asian and use 1000G data as reference panel; We specify that our target sample plink bim file is located in ./data/test.bim.

#!/bin/bash

cd example

mkdir ./results/PANTHER; mkdir ./results/PANTHER/post; mkdir ./results/PANTHER/post_collect

for chr in {1..22};do
  for pst_pop in {EUR EAS};do
    python \
    /X-Wing/PANTHER.py \
    --ref_dir ./data/PANTHER_1kg_ref \
    --bim_prefix ./data/test \
    --sumstats ./data/sumstats/BMI_EUR.txt,./data/sumstats/BMI_EAS.txt \
    --n_gwas 359983,158284 \
    --anno_file ./results/LOGODetect/annot_EUR.txt,./results/LOGODetect/annot_EAS.txt \
    --chrom ${chr} \
    --pop EUR,EAS \
    --target_pop EAS \
    --pst_pop ${pst_pop} \
    --out_name BMI \
    --seed 3 \
    --out_dir ./results/PANTHER/post
  done
done

The format of the input GWAS summary statistics is the same as LOGODetect.
The format of the input annotation file is the same as the output annotation file for LOGODetect.
We used for loop in this example, but in practice it can be time-consuming. We highly recommend parallel computation for the 22 autosomes and each posterior populations.

The output file of step2 is

head ./results/PANTHER/post/BMI_EAS_pst_eff_phiauto_chr1.txt

1	  rs4475691	  846808	T	C	7.715496e-05
1	  rs7537756	  854250	G	A	-5.351412e-05
1	  rs3748592	  880238	A	G	-2.837376e-04

where the columns in order are chromosome, SNP rs ID, BP, A1, A2, and posterior effect size estimate. Each row is a SNP.

PANTHER fit the model for each chromosome separately. One can merge each output file for 22 autosomes into one file by:

cat ./results/PANTHER/post/BMI_EUR_pst_eff_phiauto_chr*.txt >     ./results/PANTHER/post_collect/BMI_EUR_Post.txt
sed  -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/PANTHER/post_collect/BMI_EUR_Post.txt

cat ./results/PANTHER/post/BMI_EAS_pst_eff_phiauto_chr*.txt > ./results/PANTHER/post_collect/BMI_EAS_Post.txt
sed  -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/PANTHER/post_collect/BMI_EAS_Post.txt

Step 3: Estimate the linear combination weights for PRS from multiple populations using `LEOPARD`

Step 3.1: Subsample GWAS summary statistics for training and validation sets.

Here, given the full GWAS summary statistics from East Asian population, we try to subsample the GWAS summary statistics for the training and validation set. We set the proportion of the training set is 0.75 and replicate the subsampling 4 times:

cd example

mkdir ./results/LEOPARD/sampled_sumstats

Rscript /X-Wing/LEOPARD_Sim.R \
--sumstats ./data/sumstats/BMI_EAS.txt \
--n_gwas 158284 \
--train_prop 0.75 \
--ref_prefix ./data/LEOPARD_1kg_ref/EAS/EAS_part1 \
--seed 42 \
--rep 4 \
--out_prefix ./results/LEOPARD/sampled_sumstats/BMI_EAS

The format of the input GWAS summary statistics is the same as LOGODetect.
The output for Step 3.1 contains 3 files for each replication with 4 replications in total. For example. BMI_rep1_train.txt, BMI_rep1_valid.txt, and BMI_rep1_train_valid_N.txt represent the GWAS summary statistics for the training, validation set, and the sample size for training and validation set, respectively.

Step 3.2: Fit the PRS model using sampled target population GWAS summary statistics from the training set.

Next, we fit the PANTHER PRS model using the full European GWAS summary statistics and subsampled East Asian GWAS summary statistics for training set:

cd example

mkdir ./results/LEOPARD/post; mkdir ./results/LEOPARD/post_collect

for index in {1..4}
do
  for chr in {1..22};do
    for pst_pop in {EUR EAS};do
        python /X-Wing/PANTHER.py \
        --ref_dir ./data/PANTHER_LEOPARD_1kg_ref \
        --bim_prefix ./data/test \
        --sumstats ./data/sumstats/BMI_EUR.txt,./results/LEOPARD/sampled_sumstats/BMI_EAS_rep${index}_train.txt \
        --n_gwas 359983,118713 \
        --anno_file ./results/LOGODetect/annot_EUR.txt,./results/LOGODetect/annot_EAS.txt \
        --chrom ${chr} \
        --pop EUR,EAS \
        --target_pop EAS \
        --pst_pop ${pst_pop} \
        --out_name BMI_${index} \
        --seed 3 \
        --out_dir ./results/LEOPARD/post
      done
    done
done

We used for loop in this example, but in practice it could be time-consuming. We highly recommend parallel computation for the 22 autosomes * 4 replicates * 2 populations = 176 jobs.

Since PANTHER fit the model and outputs the results for each chromosome separately. We merge each output file for 22 autosomes into one file by:

for index in {1..4}
do
  cat ./results/LEOPARD/post/BMI_${index}_EUR_pst_eff_phiauto_chr*.txt > ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt
  sed  -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt

  cat ./results/LEOPARD/post/BMI_${index}_EAS_pst_eff_phiauto_chr*.txt > ./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt
  sed  -i '1iCHR\tSNP\tBP\tA1\tA2\tBETA' ./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt
done

Step 3.3: Estimate the linear combination weights.

At last, we estimate the linear combination weights using the estimated SNP posterior mean effects from Step 3.2 and subsampled East Asian GWAS summary statistics for validation set:

cd example
mkdir ./results/LEOPARD/weights

# Estimating the linear combination weights
for index in {1..4}
do
    Rscript /X-Wing/LEOPARD_Weights.R \
    --beta_file ./results/LEOPARD/post_collect/BMI_${index}_EUR_Post.txt,./results/LEOPARD/post_collect/BMI_${index}_EAS_Post.txt \
    --valid_file ./results/LEOPARD/sampled_sumstats/BMI_EAS_rep${index}_valid.txt \
    --n_valid 39571 \
    --ref_prefix ./data/LEOPARD_1kg_ref/EAS/EAS_part3 \
    --out ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep${index}.txt
done

The output of Step 3.3 is the linear combination weights for multiple population-specific PRS. Here is the result for replication 1:
```
head ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep1.txt

Path	                                                  Weights
./results/LEOPARD/Post_collect/BMI_1_EUR_Post.txt	  95.3952930409908
./results/LEOPARD/Post_collect/BMI_1_EAS_Post.txt	  202.295367124474
```
where columns in order are the PATH to the file contains SNP posterior effects and linear combination weights. Each row is the result for each PRS included.

The final linear combination weights is the average of the relative weights across 4 replications. We used LEOPARD_avg.R to produce the it:

Rscript /X-Wing/LEOPARD_Avg.R \
    --weights_prefix ./results/LEOPARD/weights/BMI_LEOPARD_weights_rep \
    --rep 4 \
    --out ./results/LEOPARD/weights/BMI_LEOPARD_weights_avg.txt

The results is

head ./results/LEOPARD/weights/BMI_LEOPARD_weights_avg.txt

Path	                                                Weights
./results/LEOPARD/post_collect/BMI_1_EUR_Post.txt	0.411651331616073
./results/LEOPARD/post_collect/BMI_1_EAS_Post.txt	0.588348668383927

Thus, the final linearly combined PRS is 0.412 x EUR_PRS + 0.588 x EAS_PRS. We recommend centering each population-specific PRS in the target dataset when calculating the final linearly combined PRS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Real example for X‐Wing

Step 0: Download the required data and LD from reference panel for the example.

Step 1. Obtain the annotation based on cross-population local genetic correlation using `LOGODetect`

Applying LOGODetect

Step 2. Calculate the SNPs posterior effect size by incorporating the portability-informed annotation into `PANTHER`

Fit PANTHER PRS model

Step 3: Estimate the linear combination weights for PRS from multiple populations using `LEOPARD`

Step 3.1: Subsample GWAS summary statistics for training and validation sets.

Step 3.2: Fit the PRS model using sampled target population GWAS summary statistics from the training set.

Step 3.3: Estimate the linear combination weights.

Clone this wiki locally

2. Real example for X‐Wing

Step 0: Download the required data and LD from reference panel for the example.

Step 1. Obtain the annotation based on cross-population local genetic correlation using LOGODetect

Applying LOGODetect

Step 2. Calculate the SNPs posterior effect size by incorporating the portability-informed annotation into PANTHER

Fit PANTHER PRS model

Step 3: Estimate the linear combination weights for PRS from multiple populations using LEOPARD

Step 3.1: Subsample GWAS summary statistics for training and validation sets.

Step 3.2: Fit the PRS model using sampled target population GWAS summary statistics from the training set.

Step 3.3: Estimate the linear combination weights.

Clone this wiki locally

Step 1. Obtain the annotation based on cross-population local genetic correlation using `LOGODetect`

Step 2. Calculate the SNPs posterior effect size by incorporating the portability-informed annotation into `PANTHER`

Step 3: Estimate the linear combination weights for PRS from multiple populations using `LEOPARD`