CLEAR is built to run on any system with Anaconda Python (see: https://www.anaconda.com/download/) properly installed.
The following packages should additionally be installed:
matplotlib
numpy
Processing samples from Human genome requires approximately 1GB of RAM, but this can vary based on the size of the reference genome and complexity of the coverage map.
- Ensure that python is properly installed and available on the system path.
- Clone the CLEAR repository into a working folder for installation.
- Retreive needed reference files in NCBI table format
- Example download locations:
- NCBI RefSeq GRCh38: http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/ncbiRefSeq.txt.gz
- NCBI RefSeq GRCm38: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/ncbiRefSeq.txt.gz
- Ensure that reference files are uncompressed.
- Use an external tool such as
BedTools
to create a.bed
file of coverages from each aligned sample to be interrogated.
For each .bed
file, do the following:
- Run
python make_dat.py [Name of reference file] [Name of the .bed file from above]
, which will generate a file ending in.dat
with the mu callings. - Run
python fitter.py [Name of dat file generated in (1)]
which will print the passing transcripts to the terminal - This can be printed to a file usingpython fitter.py [Name of dat file generated in (1)] > file_name
For each transcript name file produced above, run the grouper.py
command as follows:
python grouper.py [name of file 1] [name of file 2] [name of...]
You can also add the --require-samples [#]
parameter, where [#]
is the number of samples a transcript
must apper in to be included in the output. This can be used to relax the "passing in all samples" requirement
used in the manuscript.
See wrapper.sh
for a complete wrapper for running the steps outlined above (bash wrapper.sh
).
- Open the folder with all previously-generated
.dat
files. - Run
python make_violin_plots.py
to create a fileCLEAR_violins.pdf
containing violin plots of all samples in the folder.
An example case containing 6 cells' data from Zeisel et al. [1], allowing you to test your installation.
To run, simply run cd example & bash run_example.sh
.
The expected results are in the result
folder
[1] Zeisel A, Muñoz-Manchado AB, Codeluppi S, Lönnerberg P et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 2015 Mar 6;347(6226):1138-42. PMID: 25700174x