Name		Name	Last commit message	Last commit date
parent directory ..
beta_approx		beta_approx
slurm		slurm
README.md		README.md
batch_eqtl_localaa_globalaa.sh		batch_eqtl_localaa_globalaa.sh
concat_cov.R		concat_cov.R
eqtl_localaa_globalaa.R		eqtl_localaa_globalaa.R
extract-egenes.py		extract-egenes.py
filter_expression_admixed.R		filter_expression_admixed.R
get_uniq_egenes.R		get_uniq_egenes.R
prepare_covariates.sh		prepare_covariates.sh

README.md

eQTL calling

Prepare VCFs

The SHAPEIT2-phased, MAF 0.01-filtered GTEx v8 WGS VCF was converted to hg19 using LiftOver for compatibility with the GRCh37 HapMap genetic map and hg19 1000 Genomes (liftOver-hg38Tohg19.sh). This VCF was then split into chromosome-level VCFs, each of which was then filtered to include only 117AX samples and MAC > 10 using vcftools:

vcfin=genotypes-hg19
vcfdir=admixed/geno

for chr in {1..22}; do
  vcf_unfilt=${vcfin}/gtex.MAF01.phased.chr${chr}.vcf.gz
  vcf_out=${vcfdir}/gtex.admixed.MAC10.phased.chr${chr}
  vcftools \
    --gzvcf ${vcf_unfilt} \
    --keep ${admix_ids} \
    --recode-INFO-all \
    --mac 10 \
    --max-alleles 2 \
    --min-alleles 2 \
    --out ${vcf_out} \
    --recode 
done

gzip ${vcfdir}/gtex.admixed.MAC10.phased.chr*

Prepare covariates

prepare_covariates.sh concatenates covariates provided by GTEx and new PEER factors. It requires paths to two GTEx v8 files:

${tissue}.v8.normalized_expression.bed.gz (hg38 coordinates now)
${tissue}.v8.covariates.txt

It also calls 3 other scripts:

It has a few outputs:

${tissue}.v8.normalized_expression.admixed_subset.bed: expression in 117AX (however many have samples in this tissue)
other_covariates.txt: necessary covariates extracted from the GTEx v8 covariates file
peer folder with 3 outputs, including ${tissue}.PEER_covariates.txt, which are the PEER covariates calculated using only 117AX samples
${tissue}.all_covariates.txt: combined covariates. This file will be used for eQTL calling.

Run eQTL calling with both LocalAA and GlobalAA

batch_eqtl_localaa_globalaa.sh splits the expression file for a tissue into chunks of 50 genes (line 49) and runs eqtl_localaa_globalaa.R on each chunk of genes per chromosome.

eqtl_localaa_globalaa.R requires several inputs (see arguments):

chr: Chromosome number
tissue: Tissue specified with GTEx file prefix, e.g. Muscle_Skeletal
exprfile: Path to GTEx v8 expression file for this tissue
globalcov: Path to ${tissue}.all_covariates.txt from the previous step
geno: Path to filtered VCF for this chromosome (from this step)
localcov: Path to local ancestry covariates for this chromosome from this step
out: Output file. Results from each test are added to this file as they are generated. This file is analogous to GTEx's "allpairs" files

batch_eqtl_localaa_globalaa.sh was written to parallelize eQTL calling on a cluster without a job submission system. As written, it restricts processes to 9 cores. If you have access to a HPC with a job submission system like SGE or SLURM, I highly recommend parallelizing this process as much as possible. See scripts in eqtl/slurm for an example of how to do this with SLURM.

eqtl_localaa_globalaa.R generates a progress bar for its corresponding chunk for your sanity. Once all chunks are done running, concatenate the results for each tissue (see the end of batch_eqtl_localaa_globalaa.sh).

Now let's assume that, for each tissue, you have a master allpairs file called ${tissue}-LocalAA-GlobalAA-allpairs-merged.tsv.gz. These files are very large. Some of the downstream steps are facilitated by having a filtered allpairs file that only includes tests where either the GlobalAA or LocalAA nominal p-value is less than 0.05. Generate this filtered allpairs file as follows:

zcat zcat ${tissue}-LocalAA-GlobalAA-allpairs-merged.tsv.gz | head -1 > ${tissue}-LocalAA-GlobalAA-allpairs-merged-filt.tsv
zcat ${tissue}-LocalAA-GlobalAA-allpairs-merged.tsv.gz | \
  awk -F "\t" '{ if(($6 < 0.05) || ($9 < 0.05)) { print } }' \
  >> ${tissue}-LocalAA-GlobalAA-allpairs-merged-filt.tsv
gzip ${tissue}-LocalAA-GlobalAA-allpairs-merged-filt.tsv

Get tied lead SNPs for each gene

extract-egenes.py identifies the most significant SNP(s) for each tested gene. If multiple SNPs have the same smallest p-value, all tied lead SNPs are reported. No significance threshold is applied; the lead SNP(s) is reported for all tested genes.

The input should be the unfiltered allpairs file for a given tissue. There are two outputs:

gtex.admix.global.egenes.tied.txt.gz: Lead SNPs according to GlobalAA
gtex.admix.lava.egenes.tied.txt.gz: Lead SNPs according to LocalAA

Generate some eGene sets that are repeatedly used in downstream analyses

get_uniq_egenes.R generates a few files, a few of which are specific to a particular nominal p-value cutoff:

egenes_master.RData: Lead SNPs for all tested genes; the union of gtex.admix.global.egenes.tied.txt.gz and gtex.admix.lava.egenes.tied.txt.gz for all tissues. One line per gene-SNP pair per ancestry adjustment method
allp_master.RData: Any gene-SNP pair with a nominal p-value < 0.05 (either LocalAA or GlobalAA)

Then a few files that are specific to a given nominal p-value cutoff, e.g. 1e-06:

uniq_egene_${cutoff}.RData: For each tissue, genes that have a nominal eQTL p-value less than the cutoff for only one ancestry adjustment method. Results for all tissues are concatenated
diff_lead_${cutoff}.RData: For each tissue, genes that have a nominal eQTL p-value less than the cutoff for BOTH ancestry adjustment methods but have different lead SNPs. Results for all tissues are concatenated
same_egene_diff_lead_${cutoff}.txt: Unique genes from diff_lead_${cutoff}.RData; used to filter loci for colocalization analyses later on

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eqtl

eqtl

README.md

eQTL calling

Prepare VCFs

Prepare covariates

Run eQTL calling with both LocalAA and GlobalAA

Get tied lead SNPs for each gene

Generate some eGene sets that are repeatedly used in downstream analyses

Files

eqtl

Directory actions

More options

Directory actions

More options

Latest commit

History

eqtl

Folders and files

parent directory

README.md

eQTL calling

Prepare VCFs

Prepare covariates

Run eQTL calling with both LocalAA and GlobalAA

Get tied lead SNPs for each gene

Generate some eGene sets that are repeatedly used in downstream analyses