Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linting and updates #53

Merged
merged 116 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
53c8fa3
transfer less data to gpu
Sep 1, 2023
f0b43bc
add rigid uitls
Sep 16, 2023
fb8b62a
Merge remote-tracking branch 'origin' into ipmp
Sep 17, 2023
8cf1eb9
Merge branch 'main' into ipmp
Sep 19, 2023
437d2f2
Merge remote-tracking branch 'origin' into ipmp
Sep 22, 2023
c97391d
add graph model hparams
Sep 25, 2023
a5d0eaa
add hparams
Sep 28, 2023
fd43446
add support for loading hparam override
Oct 23, 2023
c2c8b01
add optional hparam override to finetune run config
Oct 23, 2023
c2dcbee
set in memory=True as default for cath and fold classification datasets
Oct 23, 2023
d041a3e
add torch geometric compile
Oct 23, 2023
d83c62e
fix scenario where only seq_pos_enc is a node feature
Oct 23, 2023
eedc9e5
refactor logging to use a single logging call
Oct 23, 2023
005db26
remove duplicated entry
Oct 23, 2023
f184773
minor linting
Oct 23, 2023
90835ea
formatting
Oct 25, 2023
282c6c3
add pre-trained inverse folding config
Oct 25, 2023
eec1308
add EGNN pretraining config
Oct 25, 2023
02abcd8
add find graph encoder hparams
Oct 25, 2023
1847379
add baseline inverse folding config
Oct 25, 2023
881efee
add linters to toml
Oct 25, 2023
034268a
add pre-commit config
Oct 25, 2023
cdfcdb8
Add porject support for py311
Oct 25, 2023
ff44a34
Add cpu index url for torch for CI
Oct 25, 2023
bbcfab7
add cpu torch source
Oct 25, 2023
e42cf36
rollback to max python 3.10 due to lack of torchdrug support
Oct 26, 2023
ac9f5b7
bump graphein to 1.7.4 for PyG 2.4+ support (and backwards compatibility
Oct 26, 2023
736bf63
add warning log
Oct 26, 2023
d3536d7
add list to track processed files in case of overwrite
Oct 26, 2023
4ee8b90
addmodel attribution script
Oct 27, 2023
d24bcf6
update graphein version to 1.7.5+ and add captum dependency
Oct 27, 2023
03271da
add some more docstrings, clean up
Oct 28, 2023
684b3e5
add attribution to cli
Oct 28, 2023
089aff7
update gitignore
Oct 28, 2023
3c6f3e1
Merge branch 'main' into ipmp
a-r-j Nov 3, 2023
b7ffda0
add DDP support #44
Nov 3, 2023
30679e3
update readme
Nov 3, 2023
151aa10
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 3, 2023
6589266
update readme
Nov 3, 2023
594a596
ignore igfold dataset in overwrite test
Nov 3, 2023
2007c4b
add IGFold prediction datasets
Nov 3, 2023
63ccd14
Add igfold datamodule
Nov 3, 2023
75dc816
fix binary graph classification config
Nov 7, 2023
9771abd
add cdconv model
Nov 12, 2023
0acaaa5
fix test fixture
Nov 12, 2023
e07cca2
update docs
Nov 12, 2023
56a9c5e
linting
Nov 12, 2023
5549a34
linting
Nov 12, 2023
d9c4ca9
add full model config for finetuning
Nov 12, 2023
fc7521c
linting
Nov 12, 2023
5315f7e
fix device request logging
Nov 13, 2023
6ba482b
add multihot label encoder
Nov 13, 2023
33caa05
speed up positional encoding computation
Nov 16, 2023
6f27516
fix device logging
Nov 16, 2023
ca66359
add num_classes to GO datasets
Nov 16, 2023
103b613
lint cdconv config
Nov 16, 2023
1e6090d
add multilabel classification task configs
Nov 16, 2023
9270e44
refactor f1_max for multilabel classif.
Nov 16, 2023
4be174a
add auprc to classification metrics, linting
Nov 16, 2023
b32afda
linting
Nov 16, 2023
a1b3a74
set in_memory=True as default for GO datasets
Nov 16, 2023
06364b2
clean up EC dataset
Nov 17, 2023
91cbc2f
clean up GO dataset
Nov 17, 2023
40baf0c
improve instantiation test
Nov 17, 2023
d85c6f0
set ec to in memory
Nov 17, 2023
82e4024
fix GO labelling
Nov 17, 2023
2e9d73c
fix metrics memory leak
Nov 18, 2023
64658e1
add ec_Reaction sweep
Nov 19, 2023
075c00b
Ignore local Conda env in project directory
amorehead Nov 20, 2023
dc3ec55
add missing mace ca_angles hparams
Nov 20, 2023
317b4d1
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 20, 2023
e3354a3
add addev config
Nov 21, 2023
415e6d3
A dataset loading script for antibody_developability.py
amorehead Nov 21, 2023
3dc7c99
Merge branch 'ipmp' of https://github.com/a-r-j/ProteinWorkshop into …
amorehead Nov 21, 2023
4879e50
add esm BB config
Nov 22, 2023
07830ef
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 22, 2023
06bb80f
Add ESM config for all feature schemes
amorehead Nov 22, 2023
c9c3445
add ppi prediction task updates
Nov 22, 2023
f789df2
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 22, 2023
f8ea13b
add ppi sweep config
Nov 22, 2023
15fc31b
Update test script for masif_dataset.py
amorehead Nov 22, 2023
b30d40f
Update path for masif_site in test script
amorehead Nov 22, 2023
01caa96
mask additional attributes in PPI site prediction
Nov 22, 2023
9f50f04
resolve sequence tokenization
Nov 22, 2023
74004f8
refactor chain identification
Nov 23, 2023
9d9994c
fix error in error fix
Nov 23, 2023
ee8b3cf
Fix fix of a fix
amorehead Nov 23, 2023
d827c84
exclude erroneous examples
Nov 23, 2023
e4b4d62
fix edge cases
Nov 23, 2023
eb44c49
fix edge cases
Nov 23, 2023
76aa0b5
Merge branch 'ipmp' of https://www.github.com/a-r-j/ProteinWorkshop i…
Nov 23, 2023
758a292
add model io utils
Nov 27, 2023
595b5ac
standardise default features for train and finetune configs #61
Nov 30, 2023
381040c
refactor to new recommended jaxtyping/beartype syntax
Dec 26, 2023
9fbc041
typechecker refactor for esm
Dec 26, 2023
69d0474
typechecker refactor for dataset base
Dec 26, 2023
248478b
lint
Dec 26, 2023
e7123f6
remove merge artifact from poetry.lock
Dec 26, 2023
5173259
fix beartype import
Dec 26, 2023
a0a775a
fix broken lock file
Dec 26, 2023
6be97aa
fix broken poetry.lock and update jaxtyping dependency
Dec 26, 2023
3d39e27
fix broken poetry.lock and update jaxtyping dependency
Dec 26, 2023
d7d22b9
use mamba in test workflow
Dec 26, 2023
be5dd40
fix pyg wheel link for torch > 2.1.0
Dec 26, 2023
735640b
update tests
Dec 26, 2023
8d32aa2
lint
Dec 26, 2023
b15bdf2
fix test
Dec 26, 2023
a0782c4
set dummy labels on example_batch
Dec 26, 2023
cc4364e
fix zenodo url
Dec 26, 2023
cefcb8b
fix zenodo url
Dec 26, 2023
c322df6
fix beartype import name
Dec 26, 2023
3e735c8
add changelog
Dec 26, 2023
19dd12d
add attribution to toc
Dec 26, 2023
22fca65
Update install instructions to PyTorch 2.1.2+, and sync docs with REA…
amorehead Dec 28, 2023
3014dc3
fix malformed HTML in quickstart components
Dec 28, 2023
a46c4ed
minor fixes to docs
Dec 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/ambv/black
rev: 23.9.1
hooks:
- id: black
- repo: https://github.com/jsh9/pydoclint
# pydoclint version.
rev: 0.3.3
hooks:
- id: pydoclint
args:
- "--config=pyproject.toml"
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.1.1
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@
"vu": "\\mathbf{u}",
"vv": "\\mathbf{v}",
"vw": "\\mathbf{w}",
"vx": "\\mathbf{x}",
"vy": "\\mathbf{y}",
"vz": "\\mathbf{z}",
}
Expand Down
3,182 changes: 1,378 additions & 1,804 deletions poetry.lock

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions proteinworkshop/config/dataset/cath.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ datamodule:
dataset_fraction: 1.0 # Fraction of the dataset to use
transforms: ${transforms} # Transforms to apply to dataset examples
overwrite: False # Whether to overwrite the dataset if it already exists
in_memory: True # Whether to load the entire dataset into memory
num_classes: 23 # Number of classes
1 change: 1 addition & 0 deletions proteinworkshop/config/dataset/fold_family.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ datamodule:
shuffle_labels: False # Whether to shuffle labels for permutation testing
transforms: ${transforms} # Transforms to apply to dataset examples
overwrite: False # Whether to overwrite existing dataset files
in_memory: True # Whether to load the entire dataset into memory
num_classes: 1195 # Number of classes
1 change: 1 addition & 0 deletions proteinworkshop/config/dataset/fold_fold.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ datamodule:
shuffle_labels: False # Whether to shuffle labels for permutation testing
transforms: ${transforms} # Transforms to apply to dataset examples
overwrite: False # Whether to overwrite existing dataset files
in_memory: True # Whether to load the entire dataset into memory
num_classes: 1195 # Number of classes
1 change: 1 addition & 0 deletions proteinworkshop/config/dataset/fold_superfamily.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ datamodule:
shuffle_labels: False # Whether to shuffle labels for permutation testing
transforms: ${transforms} # Transforms to apply to dataset examples
overwrite: False # Whether to overwrite existing dataset files
in_memory: True # Whether to load the entire dataset into memory
num_classes: 1195 # Number of classes
1 change: 1 addition & 0 deletions proteinworkshop/config/finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ defaults:
- finetune: default # Specifies finetuning config. See: proteinworkshop/config/finetune/
# debugging config (enable through command line, e.g. `python train.py debug=default)
- debug: null
- optional hparams: ${encoder}_${features}
- _self_ # see: https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order/. Adding _self_ at bottom means values in this file override defaults.

task_name: "finetune"
Expand Down
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/egnn_ca_angles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/egnn_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/egnn_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/egnn_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/egnn_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gcpnet_ca_angles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gcpnet_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gcpnet_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gcpnet_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gcpnet_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gear_net_edge_ca_angles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gear_net_edge_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gear_net_edge_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gear_net_edge_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.1
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/gear_net_edge_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/mace_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0010
decoder_dropout: 0.0
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/mace_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/mace_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/mace_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/schnet_ca_angles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.1
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/schnet_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/schnet_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/schnet_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/schnet_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/tfn_ca_angles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/tfn_ca_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/tfn_ca_bb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.001
decoder_dropout: 0.5
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/tfn_ca_sc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0003
decoder_dropout: 0.3
3 changes: 3 additions & 0 deletions proteinworkshop/config/hparams/tfn_ca_seq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
hparams:
lr: 0.0001
decoder_dropout: 0.3
33 changes: 24 additions & 9 deletions proteinworkshop/config/sweeps/baseline_inverse_folding.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,16 @@ metric: # Does not matter, as we are using sweep to run the experiment.

parameters:
task:
values: [inverse_folding]
value: inverse_folding

dataset:
values: [cath]
value: cath

encoder:
values: [schnet, dimenet_plus_plus, egnn, gcpnet, gear_net_edge]

optimiser.optimizer.lr:
values: [0.0001]
values: [schnet, egnn, gcpnet, gear_net_edge, tfn, mace]

features:
values: [ca_base, ca_seq, ca_angles, ca_bb]
values: [ca_seq, ca_angles, ca_bb]

scheduler:
value: plateau
Expand All @@ -28,14 +25,32 @@ parameters:
value: False

+aux_task:
values: [none, nn_sequence, nn_structure_torsion, nn_structure_r3]
values: [none, nn_structure_torsion, nn_structure_r3]

trainer.max_epochs:
value: 150
value: 250

test:
value: True

trainer:
value: gpu

logger:
value: wandb

seed:
values: [13, 42, 121]

name:
value: "${hydra:runtime.choices.encoder}_${hydra:runtime.choices.features}_${hydra:runtime.choices.aux_task}_seed_${seed}"

optimiser.optimizer.lr:
value: ${hparams.hparams.lr}

decoder.residue_type.dropout:
value: ${hparams.hparams.decoder_dropout}

command:
- ${env}
- HYDRA_FULL_ERROR=1
Expand Down
54 changes: 54 additions & 0 deletions proteinworkshop/config/sweeps/find_hparams.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
program: proteinworkshop/train.py
method: grid
name: baseline_hyperparameter_search
metric: # Does not matter, as we are using sweep to run the experiment.
goal: minimize
name: val/loss/total

parameters:
task:
values: [multiclass_graph_classification]

dataset:
values: [fold_family]

encoder:
values: [schnet, gear_net_edge, egnn, gcpnet] #, tfn]

optimiser.optimizer.lr:
values: [0.00001, 0.0001, 0.0003, 0.001]

decoder.graph_label.dropout:
values: [0.0, 0.1, 0.3, 0.5]

features:
values: [ca_base, ca_seq, ca_angles, ca_bb, ca_sc]

scheduler:
value: plateau

extras.enforce_tags:
value: False

#+aux_task:
# values: [none, nn_sequence, nn_structure_torsion, nn_structure_r3]

trainer.max_epochs:
value: 300

test:
value: True

logger:
value: wandb

name:
value: "${hydra:runtime.choices.encoder}_${hydra:runtime.choices.features}_lr_${optimiser.optimizer.lr}_d_${decoder.graph_label.dropout}"

command:
- ${env}
- HYDRA_FULL_ERROR=1
- WANDB_START_METHOD=thread
- python
- ${program}
- ${args_no_hyphens}
55 changes: 55 additions & 0 deletions proteinworkshop/config/sweeps/pre_train_egnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
program: proteinworkshop/train.py
method: grid
name: pretrain_egnn
metric: # Does not matter, as we are using sweep to run the experiment.
goal: minimize
name: val/loss/total

parameters:
task:
values:
[
inverse_folding,
sequence_denoising,
#plddt_prediction,
structure_denoising,
torsional_denoising,
]

dataset:
value: afdb_rep_v4

dataset.datamodule.num_workers:
value: 16

encoder:
values: [egnn]

optimiser.optimizer.lr:
values: [0.0001]

features:
values: [ca_angles, ca_bb]

scheduler:
value: linear_warmup_cosine_decay

callbacks.model_checkpoint.every_n_epochs:
value: 1

extras.enforce_tags:
value: False

trainer:
value: ddp

+trainer.max_epochs:
value: 10

command:
- ${env}
- HYDRA_FULL_ERROR=1
- WANDB_START_METHOD=thread
- python
- ${program}
- ${args_no_hyphens}
Loading
Loading