Skip to content

A C, C++, Python project focusing on Docking analysis, Source code, Blogs, Data availability, References.

Notifications You must be signed in to change notification settings

chenxingqiang/alphafold-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

99 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo

A reference of 'AlphaFold2 Codec' include everything of AlphaFold2.

proteins


Learning Source Availability

Papers

PPT

  • My Public talk on Alphafold2 Paper Reading By Xingqiang,Chen .Key/.pptx in AF2-PPT file.
  • Sergey Ovchinnikov talk on AF2 slides /.pptx in AF2-PPT file.

Learning by Code

πŸ““ AlphaFold2 Algorithm Notebooks (32 Complete!)

We provide 32 Jupyter Notebooks covering every algorithm from the AlphaFold2 supplementary materials. Each notebook includes:

  • Algorithm pseudocode/image reference
  • Source code location mapping
  • NumPy implementation
  • Executable test cases with verification

πŸ‘‰ Full Algorithm Index

Quick Links by Category

Category Algorithms Notebooks
Data Preprocessing MSA Block Deletion Algorithm 1
Embedding Input Embedder, relpos, one_hot Alg 3, Alg 4, Alg 5
Evoformer Stack, MSA Attention, Triangle Ops Alg 6-15
Templates Pair Stack, Pointwise Attention Alg 16, Alg 17
Extra MSA Stack, Global Attention Alg 18, Alg 19
Structure Module IPA, Backbone, Atom Coords Alg 20-25
Losses FAPE, Torsion, pLDDT Alg 26-29
Recycling Inference, Training, Embedder Alg 30, Alg 31, Alg 32
Main Pipeline Full Inference Algorithm 2
πŸ“‹ Complete Algorithm List (Click to Expand)
# Algorithm Notebook Link
1 MSA Block Deletion algorithm-1-MSABlockDeletion.ipynb
2 Inference algorithm-2-Inference.ipynb
3 Input Embedder algorithm-3-InputEmbedder.ipynb
4 relpos algorithm-4-relpos.ipynb
5 one_hot algorithm-5-one_hot.ipynb
6 Evoformer Stack algorithm-6-EvoformerStack.ipynb
7 MSA Row Attention with Pair Bias algorithm-7-MSARowAttentionWithPairBias.ipynb
8 MSA Column Attention algorithm-8-MSAColumnAttention.ipynb
9 MSA Transition algorithm-9-MSATransition.ipynb
10 Outer Product Mean algorithm-10-OuterProductMean.ipynb
11 Triangle Multiplication (Outgoing) algorithm-11-TriangleMultiplicationOutgoing.ipynb
12 Triangle Multiplication (Incoming) algorithm-12-TriangleMultiplicationIncoming.ipynb
13 Triangle Attention (Starting Node) algorithm-13-TriangleAttentionStartingNode.ipynb
14 Triangle Attention (Ending Node) algorithm-14-TriangleAttentionEndingNode.ipynb
15 Pair Transition algorithm-15-PairTransition.ipynb
16 Template Pair Stack algorithm-16-TemplatePairStack.ipynb
17 Template Pointwise Attention algorithm-17-TemplatePointwiseAttention.ipynb
18 Extra MSA Stack algorithm-18-ExtraMsaStack.ipynb
19 MSA Column Global Attention algorithm-19-MSAColumnGlobalAttention.ipynb
20 Structure Module algorithm-20-StructureModule.ipynb
21 Rigid from 3 Points algorithm-21-rigidFrom3Points.ipynb
22 Invariant Point Attention algorithm-22-InvariantPointAttention.ipynb
23 Backbone Update algorithm-23-BackboneUpdate.ipynb
24 Compute All Atom Coordinates algorithm-24-computeAllAtomCoordinates.ipynb
25 makeRotX algorithm-25-makeRotX.ipynb
26 Rename Symmetric Ground Truth Atoms algorithm-26-renameSymmetricGroundTruthAtoms.ipynb
27 Torsion Angle Loss algorithm-27-torsionAngleLoss.ipynb
28 Compute FAPE algorithm-28-computeFAPE.ipynb
29 Predict Per-Residue LDDT algorithm-29-predictPerResidueLDDT.ipynb
30 Recycling (Inference) algorithm-30-RecyclingInference.ipynb
31 Recycling (Training) algorithm-31-RecyclingTraining.ipynb
32 Recycling Embedder algorithm-32-RecyclingEmbedder.ipynb

πŸ““ AlphaFold3 Algorithm Notebooks (NEW!)

We now include AlphaFold3 algorithm notebooks! AF3 introduces significant architectural changes including diffusion-based structure prediction.

πŸ‘‰ AlphaFold3 Algorithm Index

Key AF3 Components

Category Key Algorithms Notebooks
Input MSA Features, Templates, Atom Features Alg 1-4
MSA Module Outer Product, MSA Attention Alg 5-7
Pairformer Triangle Ops, Single Attention Alg 8-14
Diffusion Diffusion Module, AdaLN, Transformer Alg 15, Alg 16
Confidence Distogram, Confidence, LDDT Alg 20-23

AF3 Source Code Submodules

# Official AlphaFold3
AF3-Ref-src/alphafold3-official/

# PyTorch Implementation (lucidrains)
AF3-Ref-src/alphafold3-pytorch/

# Architecture Walkthrough
AF3-Ref-src/alphafold3-walkthrough/

πŸ““ Boltz Algorithm Notebooks (NEW!)

We now include Boltz algorithm notebooks! Boltz is a family of models for biomolecular interaction prediction:

  • Boltz-1: First fully open source model to approach AlphaFold3 accuracy
  • Boltz-2: Adds binding affinity prediction, approaching FEP accuracy 1000x faster

πŸ‘‰ Boltz Algorithm Index

Key Boltz Components

Category Key Algorithms Notebooks
Input Processing Input Embedder, Atom Encoder, RelPos Alg 1-3
MSA Module MSA Module, Outer Product, Pair Averaging Alg 4-6
Pairformer Pairformer, Triangle Ops, Attention Alg 7-11
Diffusion Diffusion Module, Transformer, Fourier Alg 12-15
Confidence & Affinity Confidence, Distogram, Affinity (Boltz-2) Alg 16-18
Loss Functions Diffusion Loss, Confidence Loss Alg 19-20

Boltz Source Code Submodule

# Official Boltz Repository
Boltz-Ref-src/boltz-official/

Papers:

πŸ““ Boltz-2 Specific Notebooks (NEW!)

Boltz-2 introduces binding affinity prediction - the first DL model approaching FEP accuracy while being 1000x faster.

πŸ‘‰ Boltz-2 Algorithm Index

Boltz-2 New Features

Category Key Algorithms Notebooks
Affinity Prediction Affinity Module, Gaussian Smearing Alg 1-2
Contact Guidance Contact Conditioning Alg 3
Enhanced v2 Modules Input v2, Template v2, Diffusion v2 Alg 5-7
Improved Confidence Confidence v2, B-Factor Alg 8, 10

Boltz-2 Submodules

# Official Repository (contains both Boltz-1 and Boltz-2)
Boltz-Ref-src/boltz-official/

# Boltzina - Virtual Screening with Boltz-2
Boltz-Ref-src/boltzina/

Practice on Modeling Test of AF2

MD+Alphafold2


πŸ”§ Fine-tuning Framework (NEW!)

We provide a comprehensive fine-tuning framework for adapting protein structure prediction models to downstream tasks.

πŸ‘‰ Full Fine-tuning Guide

Supported Models

Model Framework Fine-tuning Support
AlphaFold2 JAX/Haiku βœ… Full, Head-only, LoRA
AlphaFold3 JAX/Haiku βœ… Full, Head-only, LoRA
Boltz-1 PyTorch βœ… Full, LoRA, Adapter
Boltz-2 PyTorch βœ… Full, LoRA, Adapter

Fine-tuning Strategies

Strategy Trainable Params Use Case
LoRA ~0.1% Small datasets, efficient fine-tuning
Adapter ~1% Modular, multiple tasks
Head-only ~5% New prediction tasks
Full 100% Large datasets, maximum performance

Supported Tasks (50+ Task Types)

We support comprehensive task coverage inspired by production platforms like ProteinBase.com:

πŸ’Š Drug Discovery
Task Outputs Applications
Binding Affinity pKd, pIC50, Ξ”G, Ki Lead optimization, SAR
Virtual Screening Hit probability, ranking HTS prioritization
ADMET Absorption, metabolism, toxicity Compound triage
πŸ”¬ Protein Engineering
Task Outputs Applications
Stability ΔΔG, Tm shift Thermostabilization
Solubility Expression score Biomanufacturing
Mutation Effects Fitness, pathogenicity Variant analysis
🧫 Antibody Design
Task Outputs Applications
Affinity Maturation CDR binding, ΔΔG Therapeutic optimization
Humanization Humanness score Drug development
Developability Aggregation, viscosity Manufacturing
βš—οΈ Enzyme Engineering
Task Outputs Applications
Activity kcat, Km, kcat/Km Catalyst design
Specificity Substrate profiles Industrial enzymes
Directed Evolution Fitness landscapes Protein engineering
πŸ”— Protein-Protein Interactions
Task Outputs Applications
PPI Binding Kd, interface stability Complex analysis
Interface Prediction Contact residues Structure analysis
Hot Spot Detection ΔΔG per residue PPI drug targets
🧬 Function Prediction
Task Outputs Applications
GO Terms MF, BP, CC Annotation
EC Numbers Enzyme classification Function discovery
Localization Subcellular compartment Systems biology
πŸ›‘οΈ Immunology
Task Outputs Applications
B-cell Epitopes Epitope probability Vaccine design
T-cell Epitopes MHC binding Immunotherapy
Immunogenicity ADA risk Drug safety
πŸ“Š Structure Quality
Task Outputs Applications
Confidence pLDDT, pAE, pTM Model validation
Disorder IDR prediction Structure analysis
Contacts Distance maps Validation

Quick Start

from finetuning import TaskRegistry, create_finetuning_pipeline
from finetuning.modules import LoRAModule
from finetuning.heads import AffinityHead

# Option 1: Use Task Registry (Recommended)
# List all available tasks
print(TaskRegistry.list_all_tasks())  # 50+ tasks

# Get task info and recommendations
info = TaskRegistry.get_task_info("binding_affinity")
print(f"Recommended LoRA rank: {info.recommended_rank}")

# Create pipeline automatically
pipeline = create_finetuning_pipeline(
    task="binding_affinity",
    base_model=model,
    strategy="lora",
)

# Option 2: Manual Setup
from finetuning import FineTuningConfig, Trainer

# 1. Load pretrained model
model = load_pretrained_boltz2()

# 2. Apply LoRA (only ~0.1% parameters trainable)
lora_model = LoRAModule(model, rank=8, alpha=16.0)

# 3. Add task-specific head
affinity_head = AffinityHead(AffinityHeadConfig())

# 4. Train
config = FineTuningConfig(
    strategy="lora",
    task="binding_affinity",
    lora_rank=8,
)
trainer = Trainer(lora_model, config, train_loader, val_loader)
trainer.train()

# 5. Save lightweight LoRA weights
lora_model.save_lora_weights("./lora_weights.pt")

Module Overview

finetuning/
β”œβ”€β”€ configs/           # Configuration classes
β”‚   β”œβ”€β”€ base_config.py      # FineTuningConfig, ModelConfig, TrainingConfig
β”‚   β”œβ”€β”€ lora_config.py      # LoRA-specific configuration
β”‚   └── task_config.py      # 25+ task configurations (ProteinBase-style)
β”œβ”€β”€ modules/           # Fine-tuning modules
β”‚   β”œβ”€β”€ lora.py             # LoRA implementation (PyTorch & JAX)
β”‚   β”œβ”€β”€ adapter.py          # Adapter modules
β”‚   └── prompt_tuning.py    # Prompt tuning
β”œβ”€β”€ heads/             # Task-specific prediction heads (15+ specialized heads)
β”‚   β”œβ”€β”€ affinity_head.py    # Binding affinity (Boltz-2 style)
β”‚   β”œβ”€β”€ property_head.py    # Protein property prediction
β”‚   β”œβ”€β”€ contact_head.py     # Contact prediction
β”‚   β”œβ”€β”€ antibody_head.py    # Affinity maturation, humanization, developability
β”‚   β”œβ”€β”€ ppi_head.py         # PPI binding, interface, hot spots
β”‚   β”œβ”€β”€ enzyme_head.py      # Activity, specificity, evolution
β”‚   β”œβ”€β”€ function_head.py    # GO terms, EC numbers, localization
β”‚   └── epitope_head.py     # B-cell, T-cell epitopes, immunogenicity
β”œβ”€β”€ trainers/          # Training utilities
β”‚   β”œβ”€β”€ trainer.py          # Main trainer class
β”‚   β”œβ”€β”€ distributed_trainer.py  # Multi-GPU training
β”‚   └── callbacks.py        # Training callbacks (EarlyStopping, Wandb, etc.)
β”œβ”€β”€ data/              # Data utilities
β”‚   β”œβ”€β”€ datasets.py         # 10+ dataset classes for all task types
β”‚   └── transforms.py       # Data augmentation (rotation, MSA dropout)
β”œβ”€β”€ examples/          # Tutorial notebooks
β”‚   └── finetuning_tutorial.ipynb  # Complete walkthrough
β”œβ”€β”€ registry.py        # Task registry and factory pattern
└── utils/             # Utility functions
    β”œβ”€β”€ checkpoint.py       # Model checkpointing
    └── metrics.py          # Evaluation metrics (lDDT, TM-score, AUROC, etc.)

Blogs

References

reference papers

πŸ“¦ AlphaFold2 Reference Source Code (Submodules)

# Official AlphaFold (DeepMind)
AF2-Ref-src/alphafold-official/

# OpenFold (PyTorch implementation)
AF2-Ref-src/openfold/

# ColabFold (Colab-friendly version)
AF2-Ref-src/colabfold/

# MMseqs2 (Sequence search)
AF2-Ref-src/mmseqs2/

# HH-suite (Template search)
AF2-Ref-src/hh-suite/

# trRosetta2 (Predecessor model)
AF2-Ref-src/trRosetta2/

# ESM (Facebook protein language model)
AF2-Ref-src/esm/

# UniRep (Protein representations)
AF2-Ref-src/unirep/

# SeqVec (Sequence embeddings)
AF2-Ref-src/seqvec/

To initialize submodules after cloning:

git submodule update --init --recursive

Data availability

All input data are freely available from public sources.

Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out).

Training used a version of the PDB downloaded 28/08/2019, while CASP14 template search used a version downloaded 14/05/2020. Template search also used the PDB70 data- base, downloaded 13/05/2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).

We show experimental structures from the PDB with accessions 6Y4F76, 6YJ177, 6VR478, 6SK079, 6FES80, 6W6W81, 6T1Z82, and 7JTL83.

For MSA lookup at both training and prediction time,

we used UniRef90 v2020_01 (https://ftp.ebi.ac.uk/pub/databases/uniprot/previous_releases/release-2020_01/uniref/),

BFD (https://bfd.mmseqs.com), Uniclust30 v2018_08 (https://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/),

and MGnify clusters v2018_12 (https://ftp.ebi.ac.uk/pub/databases/metagenomics/peptide_database/2018_12/). Uniclust30 v2018_08 was further used as input for constructing a distillation structure dataset.

Code and programmings availability

Source code

for the AlphaFold model, trained weights, and an inference script is available under an open-source license at https://github.com/deepmind/alphafold.

Neural networks

Neural networks were developed with

MSA search

For MSA search on

  • UniRef90, MGnify clusters, and reduced BFD we used jackhmmer and for template search on the PDB SEQRES we used
  • hmmsearch, both from HMMER v3.3 (http://eddylab.org/soft-ware/hmmer/).

For template search against PDB70, we used HHsearch from HH-suite v3.0-beta.3 14/07/2017 (https://github.com/soedinglab/hh-suite). For constrained relaxation of structures, we used OpenMM v7.3.1 (https://github.com/openmm/openmm) with the Amber99sb force field.

Docking analysis

Docking analysis on DGAT used

Data analysis

Data analysis used

Structure analysis

Structure analysis used Pymol v2.3.0 (https://github.com/schrodinger/pymol-open-source).

About

A C, C++, Python project focusing on Docking analysis, Source code, Blogs, Data availability, References.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •