- My Public talk on Alphafold2 Paper Reading By Xingqiang,Chen .Key/.pptx in AF2-PPT file.
- Sergey Ovchinnikov talk on AF2 slides /.pptx in AF2-PPT file.
We provide 32 Jupyter Notebooks covering every algorithm from the AlphaFold2 supplementary materials. Each notebook includes:
- Algorithm pseudocode/image reference
- Source code location mapping
- NumPy implementation
- Executable test cases with verification
π Full Algorithm Index
| Category | Algorithms | Notebooks |
|---|---|---|
| Data Preprocessing | MSA Block Deletion | Algorithm 1 |
| Embedding | Input Embedder, relpos, one_hot | Alg 3, Alg 4, Alg 5 |
| Evoformer | Stack, MSA Attention, Triangle Ops | Alg 6-15 |
| Templates | Pair Stack, Pointwise Attention | Alg 16, Alg 17 |
| Extra MSA | Stack, Global Attention | Alg 18, Alg 19 |
| Structure Module | IPA, Backbone, Atom Coords | Alg 20-25 |
| Losses | FAPE, Torsion, pLDDT | Alg 26-29 |
| Recycling | Inference, Training, Embedder | Alg 30, Alg 31, Alg 32 |
| Main Pipeline | Full Inference | Algorithm 2 |
π Complete Algorithm List (Click to Expand)
We now include AlphaFold3 algorithm notebooks! AF3 introduces significant architectural changes including diffusion-based structure prediction.
π AlphaFold3 Algorithm Index
| Category | Key Algorithms | Notebooks |
|---|---|---|
| Input | MSA Features, Templates, Atom Features | Alg 1-4 |
| MSA Module | Outer Product, MSA Attention | Alg 5-7 |
| Pairformer | Triangle Ops, Single Attention | Alg 8-14 |
| Diffusion | Diffusion Module, AdaLN, Transformer | Alg 15, Alg 16 |
| Confidence | Distogram, Confidence, LDDT | Alg 20-23 |
# Official AlphaFold3
AF3-Ref-src/alphafold3-official/
# PyTorch Implementation (lucidrains)
AF3-Ref-src/alphafold3-pytorch/
# Architecture Walkthrough
AF3-Ref-src/alphafold3-walkthrough/We now include Boltz algorithm notebooks! Boltz is a family of models for biomolecular interaction prediction:
- Boltz-1: First fully open source model to approach AlphaFold3 accuracy
- Boltz-2: Adds binding affinity prediction, approaching FEP accuracy 1000x faster
| Category | Key Algorithms | Notebooks |
|---|---|---|
| Input Processing | Input Embedder, Atom Encoder, RelPos | Alg 1-3 |
| MSA Module | MSA Module, Outer Product, Pair Averaging | Alg 4-6 |
| Pairformer | Pairformer, Triangle Ops, Attention | Alg 7-11 |
| Diffusion | Diffusion Module, Transformer, Fourier | Alg 12-15 |
| Confidence & Affinity | Confidence, Distogram, Affinity (Boltz-2) | Alg 16-18 |
| Loss Functions | Diffusion Loss, Confidence Loss | Alg 19-20 |
# Official Boltz Repository
Boltz-Ref-src/boltz-official/Papers:
Boltz-2 introduces binding affinity prediction - the first DL model approaching FEP accuracy while being 1000x faster.
| Category | Key Algorithms | Notebooks |
|---|---|---|
| Affinity Prediction | Affinity Module, Gaussian Smearing | Alg 1-2 |
| Contact Guidance | Contact Conditioning | Alg 3 |
| Enhanced v2 Modules | Input v2, Template v2, Diffusion v2 | Alg 5-7 |
| Improved Confidence | Confidence v2, B-Factor | Alg 8, 10 |
# Official Repository (contains both Boltz-1 and Boltz-2)
Boltz-Ref-src/boltz-official/
# Boltzina - Virtual Screening with Boltz-2
Boltz-Ref-src/boltzina/We provide a comprehensive fine-tuning framework for adapting protein structure prediction models to downstream tasks.
| Model | Framework | Fine-tuning Support |
|---|---|---|
| AlphaFold2 | JAX/Haiku | β Full, Head-only, LoRA |
| AlphaFold3 | JAX/Haiku | β Full, Head-only, LoRA |
| Boltz-1 | PyTorch | β Full, LoRA, Adapter |
| Boltz-2 | PyTorch | β Full, LoRA, Adapter |
| Strategy | Trainable Params | Use Case |
|---|---|---|
| LoRA | ~0.1% | Small datasets, efficient fine-tuning |
| Adapter | ~1% | Modular, multiple tasks |
| Head-only | ~5% | New prediction tasks |
| Full | 100% | Large datasets, maximum performance |
We support comprehensive task coverage inspired by production platforms like ProteinBase.com:
π Drug Discovery
| Task | Outputs | Applications |
|---|---|---|
| Binding Affinity | pKd, pIC50, ΞG, Ki | Lead optimization, SAR |
| Virtual Screening | Hit probability, ranking | HTS prioritization |
| ADMET | Absorption, metabolism, toxicity | Compound triage |
π¬ Protein Engineering
| Task | Outputs | Applications |
|---|---|---|
| Stability | ΞΞG, Tm shift | Thermostabilization |
| Solubility | Expression score | Biomanufacturing |
| Mutation Effects | Fitness, pathogenicity | Variant analysis |
π§« Antibody Design
| Task | Outputs | Applications |
|---|---|---|
| Affinity Maturation | CDR binding, ΞΞG | Therapeutic optimization |
| Humanization | Humanness score | Drug development |
| Developability | Aggregation, viscosity | Manufacturing |
βοΈ Enzyme Engineering
| Task | Outputs | Applications |
|---|---|---|
| Activity | kcat, Km, kcat/Km | Catalyst design |
| Specificity | Substrate profiles | Industrial enzymes |
| Directed Evolution | Fitness landscapes | Protein engineering |
π Protein-Protein Interactions
| Task | Outputs | Applications |
|---|---|---|
| PPI Binding | Kd, interface stability | Complex analysis |
| Interface Prediction | Contact residues | Structure analysis |
| Hot Spot Detection | ΞΞG per residue | PPI drug targets |
𧬠Function Prediction
| Task | Outputs | Applications |
|---|---|---|
| GO Terms | MF, BP, CC | Annotation |
| EC Numbers | Enzyme classification | Function discovery |
| Localization | Subcellular compartment | Systems biology |
π‘οΈ Immunology
| Task | Outputs | Applications |
|---|---|---|
| B-cell Epitopes | Epitope probability | Vaccine design |
| T-cell Epitopes | MHC binding | Immunotherapy |
| Immunogenicity | ADA risk | Drug safety |
π Structure Quality
| Task | Outputs | Applications |
|---|---|---|
| Confidence | pLDDT, pAE, pTM | Model validation |
| Disorder | IDR prediction | Structure analysis |
| Contacts | Distance maps | Validation |
from finetuning import TaskRegistry, create_finetuning_pipeline
from finetuning.modules import LoRAModule
from finetuning.heads import AffinityHead
# Option 1: Use Task Registry (Recommended)
# List all available tasks
print(TaskRegistry.list_all_tasks()) # 50+ tasks
# Get task info and recommendations
info = TaskRegistry.get_task_info("binding_affinity")
print(f"Recommended LoRA rank: {info.recommended_rank}")
# Create pipeline automatically
pipeline = create_finetuning_pipeline(
task="binding_affinity",
base_model=model,
strategy="lora",
)
# Option 2: Manual Setup
from finetuning import FineTuningConfig, Trainer
# 1. Load pretrained model
model = load_pretrained_boltz2()
# 2. Apply LoRA (only ~0.1% parameters trainable)
lora_model = LoRAModule(model, rank=8, alpha=16.0)
# 3. Add task-specific head
affinity_head = AffinityHead(AffinityHeadConfig())
# 4. Train
config = FineTuningConfig(
strategy="lora",
task="binding_affinity",
lora_rank=8,
)
trainer = Trainer(lora_model, config, train_loader, val_loader)
trainer.train()
# 5. Save lightweight LoRA weights
lora_model.save_lora_weights("./lora_weights.pt")finetuning/
βββ configs/ # Configuration classes
β βββ base_config.py # FineTuningConfig, ModelConfig, TrainingConfig
β βββ lora_config.py # LoRA-specific configuration
β βββ task_config.py # 25+ task configurations (ProteinBase-style)
βββ modules/ # Fine-tuning modules
β βββ lora.py # LoRA implementation (PyTorch & JAX)
β βββ adapter.py # Adapter modules
β βββ prompt_tuning.py # Prompt tuning
βββ heads/ # Task-specific prediction heads (15+ specialized heads)
β βββ affinity_head.py # Binding affinity (Boltz-2 style)
β βββ property_head.py # Protein property prediction
β βββ contact_head.py # Contact prediction
β βββ antibody_head.py # Affinity maturation, humanization, developability
β βββ ppi_head.py # PPI binding, interface, hot spots
β βββ enzyme_head.py # Activity, specificity, evolution
β βββ function_head.py # GO terms, EC numbers, localization
β βββ epitope_head.py # B-cell, T-cell epitopes, immunogenicity
βββ trainers/ # Training utilities
β βββ trainer.py # Main trainer class
β βββ distributed_trainer.py # Multi-GPU training
β βββ callbacks.py # Training callbacks (EarlyStopping, Wandb, etc.)
βββ data/ # Data utilities
β βββ datasets.py # 10+ dataset classes for all task types
β βββ transforms.py # Data augmentation (rotation, MSA dropout)
βββ examples/ # Tutorial notebooks
β βββ finetuning_tutorial.ipynb # Complete walkthrough
βββ registry.py # Task registry and factory pattern
βββ utils/ # Utility functions
βββ checkpoint.py # Model checkpointing
βββ metrics.py # Evaluation metrics (lDDT, TM-score, AUROC, etc.)
- DeepMind: AlphaFold-Using-AI-for-scientific-discovery
- DeepMind: alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
- DeepMind: putting-the-power-of-alphafold-into-the-worlds-hands
- Reference papers list here and you can download them by Baidu Cloud Driver Link with the code 9w2p.
- Reference Papers' Source Codes are managed via git submodules in
AF2-Ref-src/
# Official AlphaFold (DeepMind)
AF2-Ref-src/alphafold-official/
# OpenFold (PyTorch implementation)
AF2-Ref-src/openfold/
# ColabFold (Colab-friendly version)
AF2-Ref-src/colabfold/
# MMseqs2 (Sequence search)
AF2-Ref-src/mmseqs2/
# HH-suite (Template search)
AF2-Ref-src/hh-suite/
# trRosetta2 (Predecessor model)
AF2-Ref-src/trRosetta2/
# ESM (Facebook protein language model)
AF2-Ref-src/esm/
# UniRep (Protein representations)
AF2-Ref-src/unirep/
# SeqVec (Sequence embeddings)
AF2-Ref-src/seqvec/To initialize submodules after cloning:
git submodule update --init --recursiveAll input data are freely available from public sources.
Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out).
Training used a version of the PDB downloaded 28/08/2019, while CASP14 template search used a version downloaded 14/05/2020. Template search also used the PDB70 data- base, downloaded 13/05/2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).
We show experimental structures from the PDB with accessions 6Y4F76, 6YJ177, 6VR478, 6SK079, 6FES80, 6W6W81, 6T1Z82, and 7JTL83.
For MSA lookup at both training and prediction time,
we used UniRef90 v2020_01 (https://ftp.ebi.ac.uk/pub/databases/uniprot/previous_releases/release-2020_01/uniref/),
BFD (https://bfd.mmseqs.com), Uniclust30 v2018_08 (https://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/),
and MGnify clusters v2018_12 (https://ftp.ebi.ac.uk/pub/databases/metagenomics/peptide_database/2018_12/). Uniclust30 v2018_08 was further used as input for constructing a distillation structure dataset.
for the AlphaFold model, trained weights, and an inference script is available under an open-source license at https://github.com/deepmind/alphafold.
Neural networks were developed with
- TensorFlow v1 (https://github.com/tensorflow/tensorflow),
- Sonnet v1 (https://github.com/deepmind/sonnet),
- JAX v0.1.69 (https://github.com/google/jax/),
- Haiku v0.0.4 (https://github.com/deepmind/dm-haiku).
For MSA search on
- UniRef90, MGnify clusters, and reduced BFD we used jackhmmer and for template search on the PDB SEQRES we used
- hmmsearch, both from HMMER v3.3 (http://eddylab.org/soft-ware/hmmer/).
For template search against PDB70, we used HHsearch from HH-suite v3.0-beta.3 14/07/2017 (https://github.com/soedinglab/hh-suite). For constrained relaxation of structures, we used OpenMM v7.3.1 (https://github.com/openmm/openmm) with the Amber99sb force field.
Docking analysis on DGAT used
- P2Rank v2.1 (https://github.com/rdk/p2rank),
- MGLTools v1.5.6 (https://ccsb.scripps.edu/mgltools/)
- and AutoDockVina v1.1.2 (http://vina.scripps.edu/download/) on a workstation running Debian GNU/Linux rodete 5.10.40-1rodete1-amd64 x86_64.
Data analysis used
- Python v3.6 (https://www.python.org/),
- NumPy v1.16.4 (https://github.com/numpy/numpy),
- SciPy v1.2.1 (https://www.scipy.org/),
- seaborn v0.11.1 (https://github.com/mwaskom/seaborn),
- scikit-learn v0.24.0 (https://github.com/scikit-learn/),
- Matplotlib v3.3.4 (https://github.com/matplotlib/matplotlib),
- pandas v1.1.5 (https://github.com/pandas-dev/pandas),
- and Colab (https://research.google.com/colaboratory).
- TM-align v20190822 (https://zhanglab.dcmb.med.umich.edu/TM-align) was used for computing TM-scores.
Structure analysis used Pymol v2.3.0 (https://github.com/schrodinger/pymol-open-source).

