Skip to content

PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity

License

Notifications You must be signed in to change notification settings

vikramalva/PLMAlign

 
 

Repository files navigation

PLMAlign

  • 2024.6.5 Update: We have uploaded the Dataset of PLMSearch & PLMAlign in Zenodo.

This is the implement of PLMAlign, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.

Specifically, PLMAlign can achieve local and global alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.

Quick links

Webserver

PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign ✈️

PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀

PLMSearch source code : github.com/maovshao/PLMSearch 🚁

Requirements

Follow the steps in requirements.sh

Data preparation

We have released our experiment data, which can be downloaded from plmalign_data or Zenodo.

# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz

Reproduce all our experiments

Reproduce all our experiments with good visualization by following the steps in:

Notice: Detailed results are saved in data/alignment_benchmark/result/.

Notice: Detailed results are saved in data/scope40_test/output/.

Run PLMAlign locally

Notice: the inputs and outputs of the example are saved in example/.

Citation

Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5

About

PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.0%
  • Python 7.9%
  • Shell 0.1%