Skip to content

Commit

Permalink
updated pLDDT network and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanemenecker committed Sep 7, 2021
1 parent a11b9f7 commit a1a4fdf
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 4 deletions.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

### In addition to predicting disorder, metapredict also can predict AlphaFold2 pLDDT confidence scores

In addition, metapredict offers predicted pLDDT confidence scores from AlphaFold2. These predicted scores use a bidirectional recurrent neural network (BRNN) trained on the per residue pLDDT (predicted IDDT-Ca) confidence scores generated by AlphaFold2 (AF2). The confidence scores from 9 proteomes (151,970 total proteins) were used to train the BRNN used to generate these scores. The confidence scores from the proteomes of *Rattus norvegicus*, *Danio rerio*, *Dictyostelium discoideum*, *Drosophila melanogaster*, *Mus musculus*, *Saccharomyces cerevisiae*, *Arabidopsis thaliana*, *Homo sapiens*, and *Escherichia coli* were used to generate the BRNN. These pLDDT scores measure the local confidence that AlphaFold2 has in its predicted structure. The scores go from 0-100 where 0 represents low confidence and 100 represents high confidence. For more information, please see: *Highly accurate protein structure prediction with AlphaFold* https://doi.org/10.1038/s41586-021-03819-2. In describing these scores, the team states that regions with pLDDT scores of less than 50 should not be interpreted except as *possible* disordered regions.
In addition, metapredict offers predicted pLDDT confidence scores from AlphaFold2. These predicted scores use a bidirectional recurrent neural network (BRNN) trained on the per residue pLDDT (predicted IDDT-Ca) confidence scores generated by AlphaFold2 (AF2). The confidence scores (pLDDT) from the proteomes of *Danio rerio*, *Candida albicans*, *Mus musculus*, *Escherichia coli*, *Drosophila melanogaster*, *Methanocaldococcus jannaschii*, *Plasmodium falciparum*, *Mycobacterium tuberculosis*, *Caenorhabditis elegans*, *Dictyostelium discoideum*, *Trypanosoma cruzi*, *Saccharomyces cerevisiae*, *Schizosaccharomyces pombe*, *Rattus norvegicus*, *Homo sapiens*, *Arabidopsis thaliana*, *Zea mays*, *Leishmania infantum*, *Staphylococcus aureus*, *Glycine max*, *Oryza sativa* were used to generate the BRNN. These pLDDT scores measure the local confidence that AlphaFold2 has in its predicted structure. The scores go from 0-100 where 0 represents low confidence and 100 represents high confidence. For more information, please see: *Highly accurate protein structure prediction with AlphaFold* https://doi.org/10.1038/s41586-021-03819-2. In describing these scores, the team states that regions with pLDDT scores of less than 50 should not be interpreted except as *possible* disordered regions.


### What might the predicted pLDDT scores from AlphaFold2 be used for?
Expand Down Expand Up @@ -651,6 +651,13 @@ Example data that can be used with metapredict can be found in the metapredict/d

This section is a log of recent changes with metapredict. My hope is that as I change things, this section can help you figure out why a change was made and if it will break any of your current work flows. The first major changes were made for the 0.56 release, so tracking will start there. Reasons are not provided for bug fixes for because the reason can assumed to be fixing the bug...


#### V1.51

Changes:
Updated to require V1.0 of alphaPredict for pLDDT scores. This improves accuracy from over 9% per residue to about 8% per residue for pLDDT score predictions. Documentation was updated for this change.


#### V1.5

Changes:
Expand Down
7 changes: 7 additions & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ About

This section is a log of recent changes with metapredict. My hope is that as I change things, this section can help you figure out why a change was made and if it will break any of your current work flows. The first major changes were made for the 0.56 release, so tracking will start there.

V1.51
-----
Changes:
Updated to require V1.0 of alphaPredict for pLDDT scores. This improves accuracy from over 9% per residue to about 8% per residue for pLDDT score predictions. Documentation was updated for this change.



V1.5
-----
Changes:
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ How does metapredict work?

**metapredict** is a deep-learning-based predictor trained on consensus disorder data from 8 different predictors, as pre-computed and provided by `MobiDB <https://mobidb.bio.unipd.it/>`_. Functionally, this means each residue is assigned a score between 0 and 1 which reflects the confidence we have that the residue is disordered (or not). If the score was 0.5, this means half of the predictors predict that residue to be disordered. In this way, **metapredict** can help you quickly determine the likelihood that residues are disordered by giving you an approximation of what other predictors would predict (things got pretty 'meta' there, hence the name **metapredict**).

In addition, metapredict offers predicted confidence scores from AlphaFold2. These predicted scores use a bidirectional recurrent neural network (BRNN) trained on the per residue pLDDT (predicted IDDT-Ca) confidence scores generated by AlphaFold2 (AF2). The confidence scores from 9 proteomes (151,970 total proteins) were used to train the BRNN used to generate these scores. The confidence scores from the proteomes of *Rattus norvegicus*, *Danio rerio*, *Dictyostelium discoideum*, *Drosophila melanogaster*, *Mus musculus*, *Saccharomyces cerevisiae*, *Arabidopsis thaliana*, *Homo sapiens*, and *Escherichia coli* were used to generate the BRNN. These confidence scores measure the local confidence that AlphaFold2 has in its predicted structure. The scores go from 0-100 where 0 represents low confidence and 100 represents high confidence. For more information, please see: *Highly accurate protein structure prediction with AlphaFold* https://doi.org/10.1038/s41586-021-03819-2. In describing these scores, the team states that regions with pLDDT scores of less than 50 should not be interpreted except as *possible* disordered regions.
In addition, metapredict offers predicted confidence scores from AlphaFold2. These predicted scores use a bidirectional recurrent neural network (BRNN) trained on the per residue pLDDT (predicted IDDT-Ca) confidence scores generated by AlphaFold2 (AF2). The confidence scores (pLDDT) from the proteomes of *Danio rerio*, *Candida albicans*, *Mus musculus*, *Escherichia coli*, *Drosophila melanogaster*, *Methanocaldococcus jannaschii*, *Plasmodium falciparum*, *Mycobacterium tuberculosis*, *Caenorhabditis elegans*, *Dictyostelium discoideum*, *Trypanosoma cruzi*, *Saccharomyces cerevisiae*, *Schizosaccharomyces pombe*, *Rattus norvegicus*, *Homo sapiens*, *Arabidopsis thaliana*, *Zea mays*, *Leishmania infantum*, *Staphylococcus aureus*, *Glycine max*, *Oryza sativa* were used to generate the BRNN. These confidence scores measure the local confidence that AlphaFold2 has in its predicted structure. The scores go from 0-100 where 0 represents low confidence and 100 represents high confidence. For more information, please see: *Highly accurate protein structure prediction with AlphaFold* https://doi.org/10.1038/s41586-021-03819-2. In describing these scores, the team states that regions with pLDDT scores of less than 50 should not be interpreted except as *possible* disordered regions.


What might the predicted confidence scores from AlphaFold2 be used for?
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ matplotlib
protfasta
scipy
urllib3
alphaPredict
#
###### Requirements with Version Specifiers ######
# See https://www.python.org/dev/peps/pep-0440/#version-specifiers
#
alphaPredict == 1.0
37 changes: 37 additions & 0 deletions metapredict/backend/meta_predict_disorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,14 @@
PATH = os.path.dirname(os.path.realpath(__file__))

# Setting predictor equal to location of weighted values.

# originl network
predictor = "{}/networks/meta_predict_disorder_100e_v1.pt".format(PATH)

# V2 network holds slight increases in accuracy but is still undergoing testing.
# so far, 0.5% increase in accuracy has been consistently seen. V1 is the published
# network though, so leaving fo the time being.
# predictor = "{}/networks/metapredict_network_v2_200epochs_nl1_hs20.pt".format(PATH)

##################################################################################################
# hyperparameters used by when metapredict was trained. Manually setting them here for clarity.
Expand All @@ -34,6 +40,36 @@
#


'''
meta_predict_disorder_100e_v1 paramters
# original published network!
device = 'cpu'
hidden_size = 5
num_layers = 1
dtype = 'residues'
num_classes = 1
encoding_scheme = 'onehot'
input_size = 20
problem_type = 'regression'
# metapredict_network_v2_200epochs_nl1_hs20 parameters
# if you want to use V2 network, move this code out of
commented out section and delete similar code below.
device = 'cpu'
hidden_size = 20
num_layers = 1
dtype = 'residues'
num_classes = 1
encoding_scheme = 'onehot'
input_size = 20
problem_type = 'regression'
'''


device = 'cpu'
hidden_size = 5
num_layers = 1
Expand All @@ -43,6 +79,7 @@
input_size = 20
problem_type = 'regression'


# set location of saved_weights for load_state_dict
saved_weights = predictor

Expand Down
Binary file not shown.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
'protfasta',
'scipy',
'urllib3',
'alphaPredict'], # Required packages, pulls from pip if needed; do not use for Conda deployment
'alphaPredict==1.0'], # Required packages, pulls from pip if needed; do not use for Conda deployment
# platforms=['Linux',
# 'Mac OS-X',
# 'Unix',
Expand Down

0 comments on commit a1a4fdf

Please sign in to comment.