Skip to content

autosome-ru/hocomoco_rsnp_benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hocomoco_rsnp_benchmarks

A pipeline for ChIP-Seq and HT-SELEX motif benchmarking for HOCOMOCO v12. It requires additional software to be placed in the specified directories.

Python requirements

Your Python version must be 3.8 or higher.

Python libraries

You must have these Python packages installed:

  • NumPy
  • pandas
  • SciPy
  • scikit-learn

Additional software

  1. MACRO-PERFECTOS-APE — place the file ape.jar into the directory ./external_programs.
  2. Bedtools — place the file bedtools.static into the directory ./external_programs.
  3. SPRY-SARUS v2.0.2 — place the file sarus-2.0.2.jar into the directory ./external_programs.

Adjusting the threads number

To increase the number of threads for computing write the exact number into ./procfile without any other symbols. The default value is 1 which means single-threaded computing.

Usage

Execute ./autorun.sh in this very directory!

Input data

Ten motifs of the transcription factor FOXA2 were chosen as demonstration motifs. These matrices are placed in the ./pwm directory.

Custom motifs

  • You can benchmark your own models placing them into the ./pwm directory.
  • The file name must start with transcription factor name separated from the rest of the PWM name with @ symbol. The extension of the file must be .pwm.
  • ADASTRA and HT-SELEX data for a custom transcription factor must be placed in the directories ./adastra/TF and ./selex/batchX respectively where batchX refers to batch1 or batch2 depending on the set of HT-SELEX experiments. The names of these files must include transcription factor name only. The extension of the files must be .tsv.
  • Genome files must be placed in the ./assembly directory. These must be GRCh37 (hg19) and GRCh38 (hg38) human genome assemblies. The file names must be hg19.fa and hg38.fa respectively.

Output data

The output data is stored in the ./results directory. The results for ChIP-Seq both batches of HT-SELEX are written in the files adastra_motifs.tsv and selex_motifs.tsv respectively.

Authors

The benchmark was written by Mikhail Nikonov.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published