GitHub - budach/pysster: pysster: Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

pysster: a Sequence-STructure classifiER

Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

pysster is a Python package for training and interpretation of convolutional neural networks on biological sequence data. Sequences are classified by learning sequence (and optionally structure) motifs and the package offers sensible default parameters, a hyper-parameter optimization procedure and options to visualize learned motifs. The main features of the package are:

multi-class and single-label or multi-label classifications
hyper-parameter tuning (grid search)
interpretation of learned motifs in terms of positional and class enrichment and motif co-occurrence
support of input strings over user-defined alphabets (e.g. applicable to DNA, RNA, protein data)
optional use of structure information, handcrafted features and recurrent layers
seamless CPU or GPU computation

If you found our tool useful for your work, please cite the accompanying Bioinformatics paper (link). If you run into bugs, missing documentation or if you have a feature request, feel free to open an issue.

Installation

pysster is compatible with Python 3.5+ and can be installed from PyPI or GitHub.

Install latest version from GitHub:

git clone https://github.com/budach/pysster.git
cd pysster
pip3 install .

Install from PyPI:

pip3 install pysster

Using the GPU

pysster depends on TensorFlow and by default the CPU version of TensorFlow will be installed. If you want to use your NVIDIA GPU (which is recommended for large data sets or grid searchs) make sure that your CUDA and cuDNN drivers are correctly installed and then install the GPU version of TensorFlow:

pip3 uninstall tensorflow
pip3 install tensorflow-gpu

At the time of writing the most recent TensorFlow version is 1.14 and the pre-built binary requires CUDA 10 and cuDNN 7.4. You can always check the required versions in the TensorFlow GPU support notes.

Right now, we only support TensorFlow 1.x. TensorFlow 2 has recently been released and we plan switching to it and its integrated tf.keras in the future.

Documentation

Tutorials

Example workflow (data loading, model training via grid search, model evaluation + motif visualization showcased using an RNA editing data set)
Visualization by optimization of all network layers (an alternative visualization method showcased using an artifical data set)
Limitations of Neural Networks (some critical thoughts on networks applied to sequence data)

API documentation

Data objects (handling of input data)
Model objects (training and interpretation of networks)
Grid_Search objects (hyperparameter tuning)
Motif objects (motif representation of a PWM)
utils functions (save/load Data/Model objects, predict/annotate secondary structures, further processing, etc.)

Changelog

v1.2.2 - 22. October 2019 (PyPI)

fix Tensorflow version to < 2.0 for now

v1.2.1 - 28. February 2019 (PyPI)

small fix to be compatible with the forgi 2.0 dependency

v1.2.0 - 6. December 2018 (PyPI)

breaking change: the load_additional_data() method now requires a new parameter categories containing all possible categories when adding categorical data
input dropout is now also applied to data loaded via load_additional_data()
performance improvements when creating large Data objects and when visualizing kernels
fixed a crash when printing grid search summaries involving RNN layers

v1.1.4 - 17. July 2018 (PyPI)

added load_additional_positionwise_data() method to Data objects (add arbitrary numerical features for every sequence position; learned features can be visualized for each kernel using the usual Model methods)
the positive class ("class_0") will now be used as the reference class when computing AUCs in binary classifications (previously the negative class was used)
some small fixes

v1.1.3 - 19. March 2018 (PyPI)

added visualize_all_kernels() method to Model objects (visualize all kernels at once + get HTML summary report)
it is now possible to maximize the PR-AUC (precision-recall) instead of the ROC-AUC during a grid search
changed default color scheme for ACGT and ACGU alphabets to match conventions
fixed a bug that prevented Data objects from being reproducible

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
docs		docs
pysster		pysster
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
appveyor.yml		appveyor.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pysster: a Sequence-STructure classifiER

Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

Installation

Using the GPU

Documentation

Changelog

About

Releases

Packages

Languages

License

budach/pysster

Folders and files

Latest commit

History

Repository files navigation

pysster: a Sequence-STructure classifiER

Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

Installation

Using the GPU

Documentation

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages