ORE identifies outlier genes with more rare variants than expected by chance (and vice-versa). Paper in Bioinformatics.
Cursory use of ORE (outlier-RV enrichment) is provided here, visit the latest ORE documentation for more details. Confirm the following are installed:
Then, on the command line, install with
pip install ore
Example run
ore --vcf test.vcf.gz \
--bed test.bed.gz \
--output ore_results \
--distribution normal \
--threshold 2 3 4 \
--max_outliers_per_id 500 \
--af_rare 0.05 0.01 1e-3 \
--tss_dist 5000
Variants and gene expression are specified with --vcf
(line 1) and --bed
(line 2), respectively. The output prefix is provided with --output
(line 3). In this example, the outlier specifications --distribution
(line 4), --threshold
(line 5), and --max_outliers_per_id
(line 6) indicate that outliers are defined using a normal distribution with a z-score more extreme than two, and samples with more than 500 outliers are excluded. Variant information is specified with --af_rare
(line 7) and --tss_dist
(line 8) to encode that variants are defined as rare with a intra-cohort allele frequency at varying thresholds (≤ 0.05, 0.01, and 0.001), and to only use variants within 5 kb of the TSS.
Usage, visit the latest ORE documentation for more
ore [-h] [--version] -v VCF -b BED [-o OUTPUT] [--outlier_output OUTLIER_OUTPUT] [--enrich_file ENRICH_FILE] [--extrema] [--distribution {normal,rank,custom}] [--threshold [THRESHOLD [THRESHOLD ...]]] [--max_outliers_per_id MAX_OUTLIERS_PER_ID] [--af_rare [AF_RARE [AF_RARE ...]]] [--af_vcf] [--intracohort_rare_ac INTRACOHORT_RARE_AC] [--af_min [AF_MIN [AF_MIN ...]]] [--gq GQ] [--dp DP] [--aar AAR AAR] [--tss_dist [TSS_DIST [TSS_DIST ...]]] [--upstream] [--downstream] [--annovar] [--variant_class {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA,ncRNA_exonic}] [--exon_class {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}] [--refgene] [--ensgene] [--annovar_dir ANNOVAR_DIR] [--humandb_dir HUMANDB_DIR] [--processes PROCESSES] [--clean_run]
- Required arguments:
-v VCF, --vcf VCF Location of VCF file. Must be tabixed! -b BED, --bed BED Gene expression file location. Must be tabixed! - Optional file locations:
-o OUTPUT, --output OUTPUT Output prefix (default is VCF prefix) --outlier_output OUTLIER_OUTPUT Outlier filename (default is VCF prefix) --enrich_file ENRICH_FILE Output file for enrichment odds ratios and p-values (default is VCF prefix) - Optional outlier arguments:
--extrema Only the most extreme value is an outlier --distribution DISTRIBUTION Outlier distribution. Options: {normal,rank,custom} --threshold THRESHOLD Expression threshold for defining outliers. Must be greater than 0 for normal or (0,0.5) non-inclusive with rank. Ignored with custom --max_outliers_per_id MAX_OUTLIERS_PER_ID Maximum number of outliers per ID - Optional variant-related arguments:
--af_rare AF_RARE AF cut-off below which a variant is considered rare (space separated list e.g., 0.1 0.05) --af_vcf Use the VCF AF field to define an allele as rare. --intracohort_rare_ac INTRACOHORT_RARE_AC Allele COUNT to be used instead of intra-cohort allele frequency. (still uses af_rare for population level AF cut-off) --af_min AF_MIN Lower bound on AF cut-offs for --af_rare, must be same length as --af_rare (e.g., with --af_rare 0.01 0.5 and --af_min 0 0.05 ORE will compare variants within [0,0.01] and [0.05,0.5] to other variants). --gq GQ Minimum genotype quality each variant in each individual --dp DP Minimum depth per variant in each individual --aar AAR Alternate allelic ratio for heterozygous variants (provide two space-separated numbers between 0 and 1, e.g., 0.2 0.8) --tss_dist TSS_DIST Variants within this distance of the TSS are considered --upstream Only variants UPstream of TSS --downstream Only variants DOWNstream of TSS - Optional arguments for using ANNOVAR:
--annovar Use ANNOVAR to specify allele frequencies and functional class --variant_class Only variants in these classes will be considered. Options: {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA} --exon_class Only variants with these exonic impacts will be considered. Options: {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss} --refgene Filter on RefGene function. --ensgene Filter on ENSEMBL function. --annovar_dir ANNOVAR_DIR Directory of the table_annovar.pl script --humandb_dir HUMANDB_DIR Directory of ANNOVAR data (refGene, ensGene, and gnomad_genome) - optional arguments:
-h, --help show this help message and exit --version show program's version number and exit --processes PROCESSES Number of CPU processes --clean_run Delete temporary files from the previous run
Felix Richter <felix.richter@icahn.mssm.edu>