WebApp

This work is now also available under schafft-wissen.org/CALVI. The web app offers additional features such as sorting the sequences not by the number of all features but by the number of user selected features. The user can also adjust the number of sequences shown. More features will be added to the app. The standalone script below still works but won't be maintained going forward.

Introduction

The file Latest/Standalone_Annotate_Alignment_V11.py comes with several python3.6.8 functions that can annotate and highlight positions in a Clustal-formatted alignment. The resultfile is a .svg that is fully compatible for further editing in Inkscape/Illustrator or for inspection via your browser.

Citation

Please cite

Torsten Schmenger, Gaurav Diwan, Robert Bruce Russell. "PROTEORIZER: A holistic approach to untangle functional consequences of variants of unknown significance", https://doi.org/10.1101/2024.07.16.603688.

and

Gurdeep Singh (1)*+, Torsten Schmenger (1)*+, Juan-Carlos Gonzalez-Sanchez (1), Anastasiia Kutkina (1), Nina Bremec (1), Gaurav Diwan (1), Cristina Lopez (2), Rocio Sotillo (3), Robert B Russell (1)+; "Identify activating, deactivating and resistance variants in protein kinases"; 2023

when using this work.

General use case

In general, this script needs only one thing to work:

an alignment in clustal format mandatory
optional a python dictionary formatted as {Protein:{Feature:[Residues]}},
optional a protein feature dictionary, formatted as (and based on protein of interest) {Feature:[Startposition, Endposition]}.

Note: Both dictionaries can either be provided by the user or will be automatically generated by the script. In that case, the script returns both dictionaries for future use.

For example using the files in /Latest:

User prodived dictionaries

python3 Standalone_Annotate_Alignment_V11.py P61586 34 30 RHOA_BlastpExample_ClustalMSA.clustal RHOA_Blastp_info.txt Features_RHOA.txt

Minimal example

python3 Standalone_Annotate_Alignment_V11.py P61586 34 30 RHOA_BlastpExample_ClustalMSA.clustal none none

Preparing alignment

This applies to users who do not already have an alignment. The following steps can used to create an alignment using blastp and clustal omega.

Step 1: Download the fasta sequence of your protein of interest. For RHOA you could do this via Uniprot, like this. Note: You can easily build this url using https://rest.uniprot.org/uniprotkb/ + uniprotID +.fasta
Step 2: Use the downloaded fasta sequence to perform a blast search for similar sequences blastp. For more information on how to use BLAST please see Blast Help. Make sure to select "Swissprot" as a database, or otherwise make sure that the retained accessions will be UniprotIDs.
Step 3: Select the sequences you prefer and download them.


Make sure to download the complete sequences.

Step 4: Add the protein of interests fasta manually to the top of the just downloaded file. Change the formatting to roughly mimic the formatting of the remaining entries.


This is how your prepared fasta sequences should look like.

Step 5: Perform a multiple sequence alignment using Clustal Omega. Make sure to select Protein. Download & save the alignment file for usage with this script. Input the sequences via copy & paste or upload a file.

Make sure you download the complete MSA (including the clustal version, followed by 2 empty lines, followed by the MSA).

Step 6: Annotate the alignment manually or programmatically with whatever information you want, following the aforementioned format. Recommended to use the Create_Information.py script (see next section).

Required libraries/software (main script)

Python 3.6.8+

Features

NEW Returns dictionaries of collected information for re-usage by the user in future runs.
NEW Fused preparation and main script into a single script.
NEW Upgraded script to python 3 plus added some cosmetic changes.
New Added tooltips that show up on mouseover events. Works on residues with functional information.
New Added transparent rectangles to highlight a sequence conservation (= identity) over >= 70 %, based on the sequence of interest. The colors for this are taken from CLUSTAL/Jalview.
New Change highlighting to circles. Circle radius can later be adjusted based on evidence.
New Added basic heatmapping above the alignment, showing how many highlights per position & per category we have.
New Added start and end positions for each displayed sequence.
Command line functionality.

To use the script we can now execute the following command: python Standalone_Annotate_Alignment_V11.py P61586 34 30 RHOA_BlastpExample_ClustalMSA.clustal RHOA_Blastp_info.txt Features_RHOA.txt

This command has several fields after calling the script:

Field	Example	Description
0	P61586	The uniprot ID of the protein we are interested in
1	34	The position to be highlighted
2	30	The Windowsize, we show +/- the windowsize of residues around the highlighted position
3	RHOA_BlastpExample_ClustalMSA	The alignment file.
4	RHOA_Blastp_info.txt	The file containing positional information. IDs must match to those present in the alignment file. Can be left empty by writing "none". The script will then automatically populate the dictionary via Uniprot
5	Features_RHOA.txt	A file containing structural/domain features, numbering based on protein of interest. IDs must match those present in the alignment file. Can be left empty by writing "none". The script will then automatically populate the dictionary via Interpro

Note: The script allows for a little hack here. If you want a (large) .svg containing the whole alignment just give a big number in field 2, for example 20000. The script will then produce a complete alignment view. New Giving "none" instead of a position to be highlighted (field 1) works the same + it removed the position specific rectangle.

Named Output files. The resultfile will already be named depending on the input settings, so one can easily try different settings. The name follows this format: poi+"_Position"+str(startposition)+"_Windowsize"+str(windowsize)+".svg"
Conservation: Gives a black rectangle as an indicator of sequence identity (top) for the POI residue at that position.
Residue Numbering: Gives the residue number (every 10 residues), based on sequence of interest, and highlights the input residue in red.
Feature Annotation: Gives a colored background based on type of annotation (taken from uniprot) to the respective residue.
Sorted Sequences: Sequences with fewer uniprot annotations are sorted to the bottom of the alignment.
GAPs removed: Gaps are printed with white color (i.e. invisible on a white background). Additionally, columns with more than 90 % GAPs are removed from the alignment. Sequences affected by this (i.e. the up to 10 % of sequences that did not have a gap at that position) are kept and not removed.
Highlighting protein features, here for example p-loop, Switch I and the Effector region of RHOA. We currently support the displaying of up to 9 features (dependent on the given colors in featurecolors on line 2518 of this example script).

The most recent type of results

The result of executing python3 Standalone_Annotate_Alignment_V11.py P61586 34 30 RHOA_BlastpExample_ClustalMSA.clustal RHOA_Blastp_info.txt Features_RHOA.txt.

License

tschmenger/Annotate_Alignments is licensed under the GNU General Public License v3.0

Acknowledgements

Thanks to @gurdeep330 for providing feedback and suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
Latest		Latest
Old_Versions		Old_Versions
LICENSE		LICENSE
README.md		README.md
README_V7.md		README_V7.md
Release_Notes.md		Release_Notes.md
manual_FastaAdded.png		manual_FastaAdded.png
manual_blastp.png		manual_blastp.png
manual_clustal.png		manual_clustal.png
manual_interpro.png		manual_interpro.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebApp

Introduction

Citation

General use case

Preparing alignment

Required libraries/software (main script)

Features

The most recent type of results

License

Acknowledgements

About

Releases

Packages

Languages

License

tschmenger/Annotate_Alignments

Folders and files

Latest commit

History

Repository files navigation

WebApp

Introduction

Citation

General use case

Preparing alignment

Required libraries/software (main script)

Features

The most recent type of results

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages