Skip to content

AlphaFind: Discover structure similarity across the entire known proteome

License

MIT, Apache-2.0 licenses found

Licenses found

MIT
LICENSE
Apache-2.0
LICENSE-APACHE
Notifications You must be signed in to change notification settings

Coda-Research-Group/AlphaFind

Repository files navigation



AlphaCharges

AlphaFind: Discover structure similarity across the entire known proteome

DOI GitHub Actions

AlphaFind is a web-based search engine that allows for structure-based search of the entire AlphaFold Protein Structure Database. Uniprot ID, PDB ID, or Gene Symbol is accepted as input – the engine will return the most similar proteins found within AlphaFold DB, with an option for additional search to extend and refine the results. The search results are grouped by their source organism and displayed along with several similarity metrics. 3D visualizations of the structural superposition of the proteins are provided, and text filters can be used to find specific organisms or Uniprot IDs. For details about the methodology and usage, please see the manual. This website is free and open to all users and there is no login requirement.

Vector embeddings and model weights used in AlphaFind are available at AlphaFind: Discover structure similarity across the entire known proteome – data and model | Czech national repository. This project uses USalign.

Code Structure

The codebase is divided into three folders:

  • training (model training, index building)
  • api (backend)
  • ui (frontend)

See the README.md files in each folder for more details.

Installation and execution

Prerequisites / Dependencies:

Steps

  1. Clone this repository:
git clone https://github.com/Coda-Research-Group/AlphaFind.git
  1. Add execute permissions to the run.sh script:
chmod +x run.sh
  1. Run run.sh in your terminal, which will do the following:
  • build the docker image for api/, ui/ and training/
  • run the training/ container to prepare the necessary data structures
  • run the api/ container (the backend)
  • run the ui/ container (the frontend)
./run.sh
  1. Open http://localhost:8081 in your browser

Data use

The training/data/cifs folder contains a small subset of the AlphaFold DB comprising 109 proteins. The full AlphaFold DB can be downloaded from here.

To use your own protein data:

  1. Place your .cif files in the training/data/cifs directory before running run.sh.
  2. Ensure your files follow the naming convention: AF-[UniProtID]-F1-model_v4.cif.

For the full AlphaFold DB, download it from here and place the files in the same directory.

Tested on: Ubuntu 22.04 LTS

Cite Us

If you use AlphaFind in your research, please cite the following publication:

@article{prochazka2024alphafind,
  title={AlphaFind: discover structure similarity across the proteome in AlphaFold DB},
  author={Proch{\'a}zka, David and Slanin{\'a}kov{\'a}, Ter{\'e}zia and Olha, Jaroslav and Ro{\v{s}}inec, Adri{\'a}n and Gre{\v{s}}ov{\'a}, Katar{\'\i}na and J{\'a}no{\v{s}}ov{\'a}, Miriama and {\v{C}}ill{\'\i}k, Jakub and Porubsk{\'a}, Jana and Svobodov{\'a}, Radka and Dohnal, Vlastislav and others},
  journal={Nucleic Acids Research},
  pages={gkae397},
  year={2024},
  publisher={Oxford University Press}
}

Additional Information

License

MIT license