- 2024.6.5 Update: We have uploaded the
Dataset of PLMSearch & PLMAlign
in Zenodo.
This is the implement of PLMAlign, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.
Specifically, PLMAlign can achieve local and global alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI and pLM-BLAST. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.
PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign
PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀
PLMSearch source code : github.com/maovshao/PLMSearch 🚁
Follow the steps in requirements.sh
We have released our experiment data, which can be downloaded from plmalign_data or Zenodo.
# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz
Reproduce all our experiments with good visualization by following the steps in:
- Malidup: malidup.ipynb
- Malisam: malisam.ipynb
Notice: Detailed results are saved in data/alignment_benchmark/result/
.
- SCOPe40: scope40.ipynb
Notice: Detailed results are saved in data/scope40_test/output/
.
- Run PLMAlign locally by following the example in pipeline.ipynb
Notice: the inputs and outputs of the example are saved in example/
.
Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5