ashleys-qc

Automated Selection of High quality Libraries for the Extensive analYsis of Strandseq data (ASHLEYS)

ASHLEYS is developed on Linux environments using Python3.7. For a full working example on how to use ASHLEYS, please take a look at the processing pipeline. Please note that the preprocessing steps in this pipeline, e.g. short-read alignment and read duplicate marking, are always required to prepare suitable input files for ASHLEYS; the pipeline (code) itself, however, is just an example implementation, and not per se part of ASHLEYS.

Setup

Clone the repository via

git clone https://github.com/friendsofstrandseq/ashleys-qc.git ashleys-qc
cd ashleys-qc

Then create and activate the conda environment:

conda env create -f environment/ashleys_env.yml
conda activate ashleys

For final setup, run

python setup.py install

Now you should be able to see all possible modules with

./bin/ashleys.py --help

Build status

Develop branch:

Master branch:

Feature Generation

Compute features for one or more BAM files for a given window size. For a detailed explanation of what features are computed, please refer to the feature documentation.

Example usage generating all necessary features for using the pretrained models for all .bam files in the specified directory:

./bin/ashleys.py -j 23 features -f [folder_with_bamfiles] -w 5000000 2000000 1000000 \
 800000 600000 400000 200000 -o [feature_table.tsv]

Model Training

Train a new classification model based on an annotation file specifying class 1 cells. The model is trained with support vector classification based on grid search on hyperparamters.
Example usage:

./bin/ashleys.py train -p [feature_table.tsv] -a [annotation.txt] -o [output.tsv]

Prediction

Predict the class probabilities for new cells based on pre-trained models or based on customized models.
The default model trained with support vector classification should identify low-quality cells of new data with high confidence. For detailed information about the generated files, please refer to the output interpretation.

Example usage for prediction based on this pretrained model:

./bin/ashleys.py predict -p [feature_table.tsv] -o [output_folder] -m models/svc_default.pkl

When using the pretrained models, it is necessary to have scikit-learn v.0.23.2 installed, as the models were generated with this version. For customized models also a newer version of scikit-learn can be used.

Plotting

Plot the distribution of prediction probabilities.
Example usage:

./bin/ashleys.py plot -p [output_folder]/prediction.tsv -o [output_plot]

Test Data

Example of test data prediction which directly compares the predicted class to the true annotation:

./bin/ashleys.py predict -p data/test_features.tsv -o prediction.tsv \
-m models/svc_default.pkl -a data/test_annotation.txt

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
bin		bin
data		data
docs		docs
environment		environment
models		models
src/ashleyslib		src/ashleyslib
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ashleys-qc

Setup

Build status

Feature Generation

Model Training

Prediction

Plotting

Test Data

About

Releases 2

Packages

Languages

License

friendsofstrandseq/ashleys-qc

Folders and files

Latest commit

History

Repository files navigation

ashleys-qc

Setup

Build status

Feature Generation

Model Training

Prediction

Plotting

Test Data

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages