predTED

Calculating pairwise Tree Edit Distance (TED) for RNA structures can be very time-consuming, especially with a large number of different structures. Therefore, I have implemented a method to make an approximate prediction of the Tree Edit Distance using various features of the Dot Bracket Notation. This approach aims to accelerate computations by prefiltering the data for later precise calculations.

Installation

Create the Conda environment:
- If you have Conda installed, use:
```
conda env create -f environment.yml --yes
```
- If you have Mamba installed, use:
```
mamba env create -f environment.yml --yes
```
Activate the environment:
```
conda activate predTED
```

Usage

Run the prediction model

C Version

Compile the C code:

Install LightGBM statically

   git clone --recursive https://github.com/microsoft/LightGBM.git
   cd LightGBM
   mkdir build
   cd build
   cmake -DBUILD_STATIC_LIB=ON ..
   make -j
   cd ../..

Run:

#xxd -i model.txt > model.h # to compile the model for predTED.c
g++ -O3 -march=native -mtune=native -ffast-math -funroll-loops -fopenmp \

-DNDEBUG -I LightGBM/include predTED.c LightGBM/lib_lightgbm.a -lm -o predTED


- Move it to the bin folder

cp predTED ~/.local/bin/predTED

Run the prediction:
- Execute the compiled program with two RNA structures in Dot Bracket Notation:
```
./predTED "((..))" "(()).."
```
- This will output the predicted Tree Edit Distance.

Using predTED as a Python Library

To utilise predTED as a Python library, follow these steps:

Ensure the environment is activated:
- If not already activated, run:
```
conda activate predTED
```
Import the predTED module:
- In your Python script, import the module:
```
import predTED
```
Prepare your RNA structures:
- Define your RNA structures in Dot Bracket Notation as strings.

Call the predict_TED function:

Use the predict_TED function with the structures, weights, number of weights, and intercept.

Example:

struct1 = "((..))"
struct2 = "(()).."
predicted_ted = predTED.predict_TED(struct1, struct2, weights, num_weights, intercept)
print(f"Predicted TED: {predicted_ted}")

Visualisation

Below is a plot showing the relationship between predicted and true Tree Edit Distances:

This plot illustrates the accuracy of the prediction model by comparing predicted TED values to the actual TED values.

Creating your own feature weights

Prepare your data:
- Ensure you have the RNA structures in a file named structures.txt, with one structure per line in Dot Bracket Notation.
- Ensure you have the pairwise Tree Edit Distance matrix in a file named ted_matrix.txt, with space-separated values.
Run the script:
- Execute the main script to compute and save the feature weights:
```
python compute_feature_weights.py
```
- The script will compute the feature weights and save them in feature_weights.json, sorted by the absolute value of their weights.
Interpret the results:
- The feature_weights.json file contains the weights for each feature, indicating their importance in predicting the Tree Edit Distance.
- Features with larger absolute weights are more important for the prediction.
Generate model.h To make the model available for predTED, you have to converte it into a C-constant.
```
xxd -i model.txt > model.h
```

Features

The following features are computed for each RNA structure:

internal_loops: Number of internal loops.
var_depth_paired: Variance of the depth of paired bases.
multiloops: Number of multiloops.
max_loop: Size of the largest loop.
length: Length of the structure.
many more

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or issues, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
LightGBM		LightGBM
data		data
testing		testing
utils		utils
weights		weights
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
meta.yaml		meta.yaml
model.h		model.h
model.txt		model.txt
out.out		out.out
predTED.c		predTED.c
predTED.h		predTED.h
predTED.i		predTED.i
predTED.py		predTED.py
predTED_wrap.c		predTED_wrap.c
setup.py		setup.py
structures.out		structures.out
ted_distribution.png		ted_distribution.png
ted_prediction_plot.png		ted_prediction_plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

predTED

Installation

Usage

Run the prediction model

C Version

Using predTED as a Python Library

Visualisation

Creating your own feature weights

Features

License

Contact

About

Uh oh!

Releases

Packages

Languages

syseitz/predTED

Folders and files

Latest commit

History

Repository files navigation

predTED

Installation

Usage

Run the prediction model

C Version

Using predTED as a Python Library

Visualisation

Creating your own feature weights

Features

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages