-
Notifications
You must be signed in to change notification settings - Fork 15
Home
The NucleicNet is hosted on our webserver (http://www.cbrc.kaust.edu.sa/NucleicNet/). Here, we distribute a version that operates on Linux (Centos/Ubuntu). Users may also refer to other pages on this Wiki for a more detailed discussion on file input and interpretations.
The NucleicNet depends on the following publicly available software to run efficiently. Users should refer to their instruction and licenses for their prerequisite installation.
- Python 3.6.7 (https://www.python.org/downloads/release/python-367/) Primary programming language
- Anaconda 5.3.1 (https://www.anaconda.com/distribution/) Coordination of Python packages
- FEATURE 3.1.0 (https://simtk.org/projects/feature) Analysis of atomic protein models
- XSSP 2.0.4 (https://github.com/cmbi/xssp) Analysis of protein secondary structure from atomic protein models.
- Pymol 2.3 (https://pymol.org/2/) Visualisation of Binding Pockets
- cuda 8.0.61 and cudNN5.1 (https://developer.nvidia.com/rdp/cudnn-archive) Speed-up of deep learning operations.
After installing the prerequisite dependencies, run the following to configure the Python environment.
conda env create -f py3_env.yml
source activate nucleicnet
To exit from the environment, run the following.
source deactivate nucleicnet
The NucleicNet works on protein atomic model(s) written in PDB file format. Further specification on the input PDB file can be found in Specification on PDB input files. Users can put PDB file(s) into the "GridData" Folder for their analysis. After which, run the following:
# Generate features for protein atomic models
bash command_GenerateFeature.sh
# Analyse on features by deep learning module
bash command_DeepLearningModule.sh
# Organise deep learning predictions into visualisable forms
bash command_AnalysePrediction.sh
The purpose of each python script called within the bash script are annotated.
Major results are stored in the "Out" folder. Supposed our input PDB file of protein is called "GridData/0000.pdb", below outlines the purpose of the resultant output files:
- "Out/0000_pymol.pse": This is a pymol session that reveal binding pockets of each RNA constituent (e.g. The 4 bases A/U/C/G and the backbone constituent P/R for phosphate and ribose). Users can open this file by "pymol Out/0000_pymol.pse" (See Fig. 3a-c)
- "Out/0000_R_logo_RNACColor.png": Optional. If binding sites had been ascertained before as a RNA-protein complex PDB file, we can also call "NucleicNet_SequenceLogo_RNACcolor.py" to retrieve NucleicNet-predicted RNA binding specificity on each base location in form of a Sequence Logo diagram. Supposed the corresponding RNA-protein Complex is stored in "Control/0000.pdb" with RNA chain R, our "0000_R_logo_RNACColor.png" then refers to NucleicNet-predicted Sequence Logo indexed by RNA residue on chain R. (See Fig. 3-4)
We also include scripts and data to reproduce our study on Argonautes (See "command_AnalyseGridPrediction.sh"):
- "ExperimentalSequencing/RipSeq_HMMlogPDifference.png": Using the NucleicNet to score miRNA sequence for Ago Binding. The result is compared with IP-Seq data (*.txt) stored in the "ExperimentalSequencing" Folder. (See Fig 5a)
- "ExperimentalSequencing/Knockdown_Relation_All_Positive_publication.png" and "ExperimentalSequencing/Knockdown_Relation_All_Negative_publication.png" : Using the NucleicNet to evaluate miRNA loading efficiency. The result is compared with experimental Knockdown level (*.csv) stored in the "ExperimentalSequencing" Folder. (See Fig 5b)