Skip to content

Will-Raymond/human_riboswitch_hits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identification of potential riboswitch elements in Homo Sapiens mRNA 5'UTR sequences using Positive-Unlabeled machine learning


William S. Raymond1, Jacob DeRoo1, Brian Munsky1,2

1 School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
2 Chemical and Biological Engineering, Colorado State University Fort Collins, CO 80523, USA

This repository contains all the final data files used for the analysis in the above manuscript.


├─  alns/ - final alignment images for the website
│		├─ aln_%%ID%%_%%ENS_NORM%%.png
├─  data_files/ - raw/processed data files used to make the feature sets
│		├─ CCDS_nucleotide.current_11.28.2021.fa
│		├─ riboswitch_RNAcentral_8.19.21.json
│		├─ rs_dot.json
│		├─ RSid_to_ligand.json
│		├─ 5primeUTR_final_db_2.csv
│		├─ 5primeUTR_newutrdb_ML2.csv
│		├─ RS_final_db.csv
│		├─ RS_id_to_ligand.csv
│		├─ check_new_utr.py
│		├─ data_processor.py
│		├─ process_rna_central_rs_json.py
│		├─ rna_central_dot_structure_scraper.py
│		├─ species_in_RS_set.txt
├─  elkanoto_models/ - PUlearn models
│		├─ EKmodel_witheld_w_struct_features_9_26_%%LIGAND%%_%%LIGAND%%.joblib
│		├─ load_ensemble.py
│		├─ utr_proba_norm.npy
├─  ensemble_predictions/ - predictions from the ensemble classifier
│		├─ all_utr_predictions.npy
│		├─ final_set_436.json
│		├─ final_set_1533.json
│		├─ utr_dot_hits_1533.json
│		├─ utr_proba_norm.npy
│		├─ utr_proba_1533.npy
├─  feature_npy_files/ - extracted feature data arrays
│		├─ X_RS_full.npy
│		├─ X_UTR.npy
│		├─ ids_UTR.npy
│		├─ ids_RS.npy
├─  Figures/  ─  figure files for the paper
├─  GO/ ─  GO analysis output files 
│		├─ component_436.txt
│		├─ component_1533.txt
│		├─ function_436.txt
│		├─ function_1533.txt
│		├─ process_436.txt
│		├─ process_1533.txt
│		├─ gene_list_436.txt
│		├─ gene_list_1533.txt

Read the manuscript here

Rerun the analysis here: Rerun the analysis here (Colab)

Contact info: wsraymon@rams.colostate.edu