Skip to content
/ RNAC Public

Identification of multiple RNAs using feature fusion

License

Notifications You must be signed in to change notification settings

cbl-nabi/RNAC

Repository files navigation

RNAC - RNA Classifier

RNAC categorizes the RNAs into coding (cRNA), housekeeping (hkRNA), small non-coding (sncRNA) and long non-coding (lncRNA) classes using statistical, Local Binary Patterns (LBP) and Histogram features from genomic descriptors. It is implemented in python 3.

Image

Details of Source Code

  • DESC - Genomic descriptors for all species.
  • LBPDESC - LBP codes of genomic descriptors for all species.
  • PRECOMPUTED_FEAT - Pre-computed features for all species.
  • MODELS - XGBoost models for multiclass and binary classification problems.
  • UTILS - Additional files for feature extraction, data normalization and feature selection
  • TEST_SAMPLES - Testing samples
  • TEST_OUTPUT - Classification outputs
  • RNAC.py - Code for testing unknown transcripts
  • CALC_FEAT.py - Code for Feature Extraction
  • CALC_LBP.py - Code for computing LBP codes of genomic descriptors
  • DATASETS - Test RNA sequences used to evaluate the performance of RNAC

How to Use

  1. Download the code and install the REQUIRED PACKAGES
* python == 3.7.0
* h5py == 2.10.0
* scipy == 1.5.2
* xgboost == 1.3.0
* numpy == 1.19.1
* pandas == 1.1.5
* scikit_learn == 0.24.1
  1. Download the supporting data from the links provided in DESC, LBPDESC, PRECOMPUTED_FEAT, and MODELS folders.

  2. Using RNAC for testing

Run RNAC.py to obtain the multiclass and binary outcomes with following command:

python RNAC.py --Multiclass --Human Human.gtf

To get help about parameters, type [python RNAC.py -h or --help]

Usage: python RNAC.py [--classification-type] [--species] [-gtf_file]

classification-type: 	 {Multiclass, Binary}
species: 		 {Human, Mouse, Caenorhabditis_elegans, Arabidopsis_thaliana}
gtf_file: 		 GTF file location

Example: python RNAC.py --Binary --Caenorhabditis_elegans ./TEST_SAMPLES/Caenorhabditis_elegans.gtf

Note: RNAC supports Multiclass (cRNA, hkRNA, sncRNA and lncRNA) and Binary (coding vs non-coding) classification of these four species only.

The code has been tested with Python 3.6 (conda) and Ubuntu 14.04.

The sample gtf files have been provided in the TEST_SAMPLES folder to check the successful installation of RNAC.

  1. Datasets

We have added the RNA sequences of Human, Mouse, C. elegans and A. thaliana species to compare the performance of other tools with RNAC directly.

Citation

Singh D., Madhawan A. and Roy J., Identification of multiple RNAs using feature fusion, 2021

Also, do not forget to cite COME's Paper