Skip to content

2.4 Regulatory Feature Enrichment

Mark Edward M. Gonzales edited this page May 4, 2024 · 9 revisions

Why this analysis

A majority of rice SNPs in genotyping assays and genotype databases are located in non-coding region. GWAS/QTL mappings also report many non-coding trait-associated variants. It is likely these influence the activity of regulatory elements. One possible causal link is that variants could alter transcription factor binding affinity leading to changes in the expression of target genes, ultimately resulting in phenotypic variation.

To investigate variants that might be affecting the binding activity of transcription factors, RicePilaf searches for transcription factors whose known/predicted binding sites (provided by PlantRegMap) significantly overlap with the input intervals.

How to run the analysis

The screenshot below shows the user input interface for the regulatory factors enrichment analysis.


Box 1 shows your current input intervals. Genes contained in this interval are automatically included in the analysis.

Box 2 allows you to manually add genes (MSU IDs only). These could be genes that you found for example from the pan-genomic lift-over or from text-mining. You can opt to leave this input box empty.

Box 3 lets you choose the method that was used by PlantRegMap to predict transcription factor (TF) binding sites. "Motif scan" uses FIMO for a simple scan of TF binding motifs (beware of high false positive rate); "motif conservation" incorporates information on conservation of promoter sequences; FunTFBS further incorporates base-varied binding affinities information and evolutionary information of the binding sites (See the PlantRegMap paper for more details).

Box 4 lets you choose the target region where the algorithms in Box 3 were run. If "Promoters" is chosen, overlap significance is computed between your input intervals and TF binding sites predicted in promoter regions (defined by Plantregmap as 100bp downstream and 500bp upstream of transcription start site). If "genome" is chosen, TF binding sites across the whole genome is considered (beware of high false positive rates).

Box 5 is where you can input an FDR (false discovery rate). Since multiple overlap significance tests are computed for each TF in the database, the app adjust the p-values using the Benjamini-Hochberg method.

Interpreting the results

The result is a table sorted by adjusted p-value.