Skip to content

Commit

Permalink
Update to all docs. Added device selection to everything that predict…
Browse files Browse the repository at this point in the history
…s in batch on CLI.
  • Loading branch information
ryanemenecker committed Nov 6, 2024
1 parent 20de53d commit d079309
Show file tree
Hide file tree
Showing 19 changed files with 1,224 additions and 567 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ In November 2024, we changed the default version of metapredict from V2 to V3. S

For context, V3 provides major improvements to V2. Metapredict V3 uses a **new network to predict disorder** that in our benchmarks is the most accurate version to date. In addition, *V3 is backwards compatible with V2* and can be used as a drop-in replacement for V2. Although the Python API has been improved to massively simplify how you can use metapredict, we have **for the time being** updated it such that all previously created functions *should still work*. If they do not, please raise an issue and we will fix the problem ASAP!

## What are the major changes for V3?
## What are the major changes for metapredict V3?

1. **A new disorder prediction network**: Metapredict V3 uses a new (more accurate) network for disorder prediction. V1 and V2 are still available!
2. **A new pLDDT prediction network**: metapredict used to rely on an external package called [alphaPredict](https://github.com/ryanemenecker/alphaPredict) for pLDDT prediction. This same network is still available in metapredict when using ``meta.predict_pLDDT()`` by setting ``pLDDT_version=1``. However, the default V2 network is by all metrics better for pLDDT prediction, so we recommend using V2!
Expand Down
157 changes: 111 additions & 46 deletions docs/getting_started.rst

Large diffs are not rendered by default.

11 changes: 5 additions & 6 deletions docs/usage/acknowledgements.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
Acknowledgements
=================

PARROT, created by Dan Griffith, was used to generate the network used for metapredict. See https://pypi.org/project/idptools-parrot/ for some very cool machine learning stuff from Dan.
A modified version of PARROT, created by Dan Griffith, was used to generate the network used for metapredict V3. The original implementation of PARROT was used to generate the V1 and V2 networks. See `https://pypi.org/project/idptools-parrot/ <https://pypi.org/project/idptools-parrot/>`_ for some very cool machine learning stuff. You can also check out the `PARROT paper <https://elifesciences.org/articles/70576>`_.

In addition to using Dan Griffith's tool for ``encode_sequence.py`` was largely written by Dan (originally for PARROT).
In addition to using Dan Griffith's tool for creating metapredict, the original code for ``encode_sequence.py`` was written by Dan.

We would also like to thank the team at MobiDB for creating the database that was used to train V1 this predictor. Check out their awesome stuff at https://mobidb.bio.unipd.it
We would like to thank the **DeepMind** team for developing AlphaFold2 and EBI/UniProt for making these data so readily available.

We would like to thank the **DeepMind** team for developing AlphaFold and EBI/UniProt for making these data so readily available. Their data was used for both V2 and V3 of metapredict.

V3 of metapredict would not be possible without Jeff Lotthammer. Jeff worked on making hyperparameter optimization as well as more advanced machine learning architectures possible, which were critical for the development of V3. In addition, Jeff carried out the actual training of the V3 network and the V2 of the pLDDT prediction network.
We would also like to thank the team at MobiDB for creating the database that was used to train metapredict V1. Check out their awesome stuff at `https://mobidb.bio.unipd.it <https://mobidb.bio.unipd.it>`_


Contributors
Expand All @@ -21,3 +19,4 @@ We'd also like to thank the following folks who have contribute code, reported e
* Broder Schmidt
* Sean Cascarina
* Keith Cheveralls
* Henrik Åhl for help with py3.12 compatibility.
709 changes: 498 additions & 211 deletions docs/usage/command-line.rst

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/usage/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Python Version Issues

We have received occasional feedback that metapredict is not working for a user. A common problem is that the user is using a different version of Python than metapredict was made on.

metapredict was developed using Python version 3.7, but has been tested on 3.8, 3.9 and 3.10 as well. However, metapredict was developed for macOS and Linux, and while we expect it to work for Windows this has been far less rigorously tested.
metapredict was developed using Python version 3.7, but has been tested on 3.8, 3.9, 3.10, 3.11, and 3.12. However, metapredict was developed for macOS and Linux, and while we expect it to work for Windows this has been far less rigorously tested.

If you commonly use a Python version outside of the 3.7 - 3.10 window, a convenient workaround is to use a conda environment that has Python 3.8 set as the default version of Python. For more info on conda, please see https://docs.conda.io/projects/conda/en/latest/index.html
If you commonly use a Python version outside of the 3.7 - 3.12 window, a convenient workaround is to use a conda environment that has Python 3.8 set as the default version of Python. For more info on conda, please see https://docs.conda.io/projects/conda/en/latest/index.html

Once you have conda installed, simply use the command

Expand Down
448 changes: 279 additions & 169 deletions docs/usage/using-in-python.rst

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions metapredict/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
##
## metapredict
## A protein disorder predictor based on a BRNN (IDP-Parrot) trained on the consensus disorder values from
## 8 disorder predictors from 12 proteomes.
## A machine learning-based tool for predicting protein disorder.
##
##
import sys
import importlib.util
Expand Down
258 changes: 156 additions & 102 deletions metapredict/backend/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

# general imports
import os
import re
from packaging import version as packaging_version
import time
import numpy as np
Expand Down Expand Up @@ -145,6 +146,134 @@ def size_filter(inseqs):

return retdict

# ....................................................................................
#

def check_device(use_device, default_device='cuda'):
'''
Function to check the device was correctly set.
Parameters
---------------
use_device : int or str
Identifier for the device to be used for predictions.
Possible inputs: 'cpu', 'mps', 'cuda', 'cuda:int', or an int that corresponds to
the index of a specific cuda-enabled GPU. If 'cuda' is specified and
cuda.is_available() returns False, instead of falling back to CPU,
metapredict will raise an Exception so you know that you are not
using CUDA as you were expecting.
If 'mps' is specified and mps is not available, an exception will be raised.
default_device : str
The default device to use if device=None.
If device=None and default_device != 'cpu' and default_device is
not available, device_string will be returned as 'cpu'.
I'm adding this in case we want to change the default architecture in the future.
For example, we could make default device 'gpu' where it will check for
cuda or mps and use either if available and then otherwise fall back to CPU.
Returns
---------------
device_string : str
returns the device string as a string.
'''
# if use_device is None, check for cuda.
if use_device==None:
# check if default device is available.
if default_device=='cpu':
return 'cpu'
elif default_device=='mps':
if torch.backends.mps.is_available():
return default_device
else:
return 'cpu'
elif default_device=='cuda':
if torch.cuda.is_available():
return 'cuda'
else:
return 'cpu'
else:
raise MetapredictError("Default device can only be set to 'cpu', 'mps', or 'cuda'")


else:
# if input is an int, make it a string and then do checks.
if isinstance(use_device, int)==True:
use_device=f'cuda:{use_device}'

# if input is a string (it should be...)
if isinstance(use_device, str)==True:
# make use_device lowercase
use_device=use_device.lower()
# if CPU specified, use CPU
if use_device=='cpu':
return use_device
elif use_device=='mps':
# check if mps is available.
if torch.backends.mps.is_available():
return use_device
else:
raise MetapredictError('mps was specified, but mps is not available. Be sure you are running a Mac with mps-supported GPUs and a Pytorch version with mps support (>=2.1)')
elif 'cuda' in use_device:
# make sure cuda is available.
if torch.cuda.is_available()==False:
error_message = f'{use_device} was specified as the device, but torch.cuda.is_available() returned False.'
raise MetapredictError(error_message)
if use_device == 'cuda':
return use_device
elif ':' in use_device:
# make sure a positive integer is specified
pattern = r"^cuda(:\d+)?$"
# if the pattern doesn't match, raise an exception.
if re.match(pattern, str(use_device))==None:
error_message = f'{use_device} was specified as the device, but it does not match the pattern of cuda:int where int is a positive integer.'
raise MetapredictError(error_message)
else:
# make sure there are enough devices such that it is possible that the specified device index works.
device_index = int(use_device.split(":")[1])
num_devices = torch.cuda.device_count()
if device_index >= num_devices:
error_message = f'{use_device} was specified as the device, but there are only {num_devices} cuda-enabled GPUs available.\nRemember, GPU indices are 0-indexed, so cuda:0 is for the first GPU and so on.\nThe max device index you can use based on torch.cuda.device_count() is {num_devices-1}.'
raise MetapredictError(error_message)
return use_device
else:
raise MetapredictError("Device can only be set to: None, a string equal to 'cpu', 'mps', 'cuda', 'cuda:int' where int is some positive integer, or an int that is equal to the index of a specific CUDA-enabled GPU")

# if we made it here, raise error
raise MetapredictError("There is a problem with the check_device function in metapredict/backend/predictor.py.\nPlease raise an issue because you shouldn't be able to see this message.")

def take_care_of_version(version_input):
'''
Function to take care of the version to use when specifying
the network.
Parameters
---------------
version_input : int or str
The version of the network to use.
Returns
---------------
version : str
The version of the network to use.
'''
# make sure the version is a string
version_input=str(version_input)

# now convert over to what we need version to be.
if version_input=='legacy':
version_input='V1'

# if len version is 1, add a 'V' to the front.
if len(version_input)==1:
version_input=f'V{version_input}'

# make version uppercase
version_input=version_input.upper()

return version_input



# ....................................................................................

Expand All @@ -165,7 +294,8 @@ def predict(inputs,
show_progress_bar = False,
force_disable_batch=False,
disable_pack_n_pad = False,
silence_warnings = False):
silence_warnings = False,
default_to_device = 'cuda'):
"""
Batch mode predictor which takes advantage of PyTorch
parallelization such that whether it's on a GPU or a
Expand Down Expand Up @@ -310,6 +440,14 @@ def predict(inputs,
whether to silence warnings such as the one about compatibility
to use pack-n-pad due to torch version restrictions.
default_to_device : str
The default device to use if device=None.
If device=None and default_device != 'cpu' and default_device is
not available, device_string will be returned as 'cpu'.
I'm adding this in case we want to change the default architecture in the future.
For example, we could make default device 'gpu' where it will check for
cuda or mps and use either if available and then otherwise fall back to CPU.
Returns
-------------
DisorderDomain object str dict or list
Expand Down Expand Up @@ -349,22 +487,9 @@ def predict(inputs,
## FIGURE OUT WHAT NETWORK WE ARE USING
##
## ....................................................................................
# make it easy to select the version.
# make sure the version is a string
version=str(version)

# now convert over to what we need version to be.
if version=='legacy':
version='V1'

# if len version is 1, it is likely the user just input 1, 2, or 3. Try to
# add a 'v' before it so we don't have to worry about that. Not explicitly
# checking version because then it will be easier to add more in the future.
if len(version)==1:
version=f'V{version}'

# make version uppercase
version=version.upper()
# normalize such that user can input v#, V#, or # to specify the version
version = take_care_of_version(version)

# make list of possible network inputs
possible_networks=['legacy']
Expand Down Expand Up @@ -393,44 +518,12 @@ def predict(inputs,
##
## ....................................................................................

# by default, don't check if cuda is available.
check_cuda=False

# if a single sequence, just use cpu. Using GPU for a single sequence would be silly.
if isinstance(inputs, str)==True:
device_string='cpu'
else:
# If a batch of sequences, figure out what device to use or if a device was specified.
if use_device==None:
# if not specified, use a cuda enabled GPU if one is available. Otherwise fall back to CPU.
if torch.cuda.is_available():
device_string=f'cuda'
else:
device_string = 'cpu'
else:
if str(use_device).lower()=='cpu':
device_string='cpu'
elif str(use_device).lower()=='mps':
# check if mps is available.
if torch.backends.mps.is_available():
device_string='mps'
else:
raise MetapredictError('use_device was specified as mps, but mps is not available. Be sure you are running a Mac with mps-supported GPUs and a Pytorch version with mps support (>=2.1)')
elif str(use_device).lower()=='cuda':
device_string=f'cuda'
check_cuda=True
elif isinstance(use_device, int)==True:
device_string=f'cuda:{use_device}'
check_cuda=True
else:
raise MetapredictError('The variable use_device can only be set to: None, cpu, mps, cuda, or an integer value specifying the index of a cuda GPU')

# if user manually set either an GPU index or 'cuda', make sure cuda is available.
# this will help us avoid falling back to CPU unintentionally.
if check_cuda==True:
if torch.cuda.is_available()==False:
raise MetapredictError('cuda was specified as use_device, but torch.cuda.is_available() returned False.')

device_string = check_device(use_device, default_device=default_to_device)

# set device
device=torch.device(device_string)

Expand All @@ -442,7 +535,6 @@ def predict(inputs,
if silence_warnings==False:
print('Pytorch version is <= 1.11.0. Disabling pack-n-pad functionality. This might slow down predictions.')


##
## LOAD IN THE NETWORK
##
Expand Down Expand Up @@ -823,7 +915,8 @@ def predict_pLDDT(inputs,
silence_warnings = False,
return_as_disorder_score=False,
plddt_base=0.35,
plddt_top=0.95):
plddt_top=0.95,
default_to_device = 'cuda'):
"""
Batch mode predictor which takes advantage of PyTorch
parallelization such that whether it's on a GPU or a
Expand Down Expand Up @@ -917,6 +1010,14 @@ def predict_pLDDT(inputs,
the highest value plddt can be when converting it to a disorder score
Default=0.95
default_to_device : str
The default device to use if device=None.
If device=None and default_device != 'cpu' and default_device is
not available, device_string will be returned as 'cpu'.
I'm adding this in case we want to change the default architecture in the future.
For example, we could make default device 'gpu' where it will check for
cuda or mps and use either if available and then otherwise fall back to CPU.
Returns
-------------
dict or list
Expand All @@ -943,23 +1044,8 @@ def predict_pLDDT(inputs,
## FIGURE OUT WHAT NETWORK WE ARE USING
##
## ....................................................................................
# make it easy to select the version.
# make sure the version is a string
version=str(version)

# now convert over to what we need version to be.
# i don't expect people to call the plddt one legacy, but you never know.
if version=='legacy':
version='V1'

# if len version is 1, it is likely the user just input 1, 2, or 3. Try to
# add a 'v' before it so we don't have to worry about that. Not explicitly
# checking version because then it will be easier to add more in the future.
if len(version)==1:
version=f'V{version}'

# make version uppercase
version=version.upper()
# normalize such that user can input v#, V#, or # to specify the version
version = take_care_of_version(version)

# make list of possible network inputs
possible_networks=[]
Expand Down Expand Up @@ -1002,43 +1088,11 @@ def predict_pLDDT(inputs,
##
## ....................................................................................

# by default, don't check if cuda is available.
check_cuda=False

# if a single sequence, just use cpu. Using GPU for a single sequence would be silly.
if isinstance(inputs, str)==True:
device_string='cpu'
else:
# If a batch of sequences, figure out what device to use or if a device was specified.
if use_device==None:
# if not specified, use a cuda enabled GPU if one is available. Otherwise fall back to CPU.
if torch.cuda.is_available():
device_string=f'cuda'
else:
device_string = 'cpu'
else:
if str(use_device).lower()=='cpu':
device_string='cpu'
elif str(use_device).lower()=='mps':
# check if mps is available.
if torch.backends.mps.is_available():
device_string='mps'
else:
raise MetapredictError('use_device was specified as mps, but mps is not available. Be sure you are running a Mac with mps-supported GPUs and a Pytorch version with mps support (>=2.1)')
elif str(use_device).lower()=='cuda':
device_string=f'cuda'
check_cuda=True
elif isinstance(use_device, int)==True:
device_string=f'cuda:{use_device}'
check_cuda=True
else:
raise MetapredictError('The variable use_device can only be set to: None, cpu, mps, cuda, or an integer value specifying the index of a cuda GPU')

# if user manually set either an GPU index or 'cuda', make sure cuda is available.
# this will help us avoid falling back to CPU unintentionally.
if check_cuda==True:
if torch.cuda.is_available()==False:
raise MetapredictError('cuda was specified as use_device, but torch.cuda.is_available() returned False.')
device_string = check_device(use_device, default_device=default_to_device)

# set device
device=torch.device(device_string)
Expand Down
Loading

0 comments on commit d079309

Please sign in to comment.