You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
QSPRpred is open-source software libary for building **Quantitative Structure Property Relationship (QSPR)** model developed by Gerard van Westen's Computational Drug Discovery group. It provides a unified interface for building QSPR models based on different types of descriptors and machine learning algorithms. We developed this package to support our research, recognizing the necessity to reduce repetition in our model building workflow and improve the reproducibility and reusability of our models. In making this package available here, we hope that it may be of use to other researchers as well. QSPRpred is still in active development, and we welcome contributions and feedback from the community.
10
-
11
-
QSPRpred is designed to be modular and extensible, so that new functionality can be easily added. A command line interface is available for basic use cases to quickly, explore varying scenarios. For more advanced use cases, the Python API offers extra flexibility and control, allowing more complex workflows and additional features.
12
-
13
-
Internally, QSPRpred relies heavily on the <ahref="https://www.rdkit.org">RDKit</a> and <ahref="https://scikit-learn.org/stable/">scikit-learn</a> libraries. Furthermore, for scikit-learn model saving and loading, QSPRpred uses <ahref="https://github.com/OlivierBeq/ml2json">ml2json</a> for safer and interpretable model serialization. QSPRpred is also interoperable with <ahref="https://github.com/OlivierBeq/Papyrus-scripts">Papyrus</a>, a large scale curated dataset aimed at bioactivity predictions, for data collection. Models developed with QSPRpred are compatible with the group's *de novo* drug design package <ahref="https://github.com/CDDLeiden/DrugEx/">DrugEx</a>.
9
+
QSPRpred is open-source software libary for building **Quantitative Structure Property
10
+
Relationship (QSPR)** model developed by Gerard van Westen's Computational Drug
11
+
Discovery group. It provides a unified interface for building QSPR models based on
12
+
different types of descriptors and machine learning algorithms. We developed this
13
+
package to support our research, recognizing the necessity to reduce repetition in our
14
+
model building workflow and improve the reproducibility and reusability of our models.
15
+
In making this package available here, we hope that it may be of use to other
16
+
researchers as well. QSPRpred is still in active development, and we welcome
17
+
contributions and feedback from the community.
18
+
19
+
QSPRpred is designed to be modular and extensible, so that new functionality can be
20
+
easily added. A command line interface is available for basic use cases to quickly,
21
+
explore varying scenarios. For more advanced use cases, the Python API offers extra
22
+
flexibility and control, allowing more complex workflows and additional features.
23
+
24
+
Internally, QSPRpred relies heavily on the <ahref="https://www.rdkit.org">RDKit</a>
25
+
and <ahref="https://scikit-learn.org/stable/">scikit-learn</a> libraries. Furthermore,
26
+
for scikit-learn model saving and loading, QSPRpred
27
+
uses <ahref="https://github.com/OlivierBeq/ml2json">ml2json</a> for safer and
28
+
interpretable model serialization. QSPRpred is also interoperable
29
+
with <ahref="https://github.com/OlivierBeq/Papyrus-scripts">Papyrus</a>, a large scale
30
+
curated dataset aimed at bioactivity predictions, for data collection. Models developed
31
+
with QSPRpred are compatible with the group's *de novo* drug design
Note that this will install the basic dependencies, but not the optional dependencies. If you want to use the optional dependencies, you can install the package with an option:
46
+
Note that this will install the basic dependencies, but not the optional dependencies.
47
+
If you want to use the optional dependencies, you can install the package with an
- extra : include extra dependencies for PCM models and extra descriptor sets from packages other than RDKit
55
+
56
+
- extra : include extra dependencies for PCM models and extra descriptor sets from
57
+
packages other than RDKit
35
58
- deep : include deep learning models (torch and chemprop)
36
-
- pyboost : include pyboost model (requires cupy, `pip install cupy-cudaX`, replace X with your [cuda version](https://docs.cupy.dev/en/stable/install.html), you can obtain cude toolkit from Anaconda as well: `conda install cudatoolkit`)
37
-
- full : include all optional dependecies (requires cupy, `pip install cupy-cudaX`, replace X with your [cuda version](https://docs.cupy.dev/en/stable/install.html))
59
+
- pyboost : include pyboost model (requires cupy, `pip install cupy-cudaX`, replace X
60
+
with your [cuda version](https://docs.cupy.dev/en/stable/install.html), you can obtain
61
+
cude toolkit from Anaconda as well: `conda install cudatoolkit`)
62
+
- full : include all optional dependecies (requires cupy, `pip install cupy-cudaX`,
63
+
replace X with your [cuda version](https://docs.cupy.dev/en/stable/install.html))
38
64
39
65
### Note on PCM Modelling
40
66
41
-
If you plan to optionally use QSPRPred to calculate protein descriptors for PCM, make sure to also install Clustal Omega. You can get it via `conda` (**for Linux and MacOS only**):
67
+
If you plan to optionally use QSPRPred to calculate protein descriptors for PCM, make
68
+
sure to also install Clustal Omega. You can get it via `conda` (**for Linux and MacOS
69
+
only**):
42
70
43
71
```bash
44
72
45
73
conda install -c bioconda clustalo
46
74
```
75
+
47
76
or install MAFFT instead:
48
77
49
78
```bash
50
79
conda install -c biocore mafft
51
80
```
52
-
This is needed to provide multiple sequence alignments for the PCM descriptors. If Windows is your platform of choice, these tools will need to be installed manually or a custom implementation of the `MSAProvider` class will have to be made.
81
+
82
+
This is needed to provide multiple sequence alignments for the PCM descriptors. If
83
+
Windows is your platform of choice, these tools will need to be installed manually or a
84
+
custom implementation of the `MSAProvider` class will have to be made.
53
85
54
86
## Use
55
-
After installation, you will have access to various command line features and you can use the Python API directly (see [Documentation](https://cddleiden.github.io/QSPRPred/docs/)). For a quick start, you can also check out the [Jupyter notebook tutorials](./tutorials/README.md), which document the use of the Python API to build different types of models. The tutorials as well as the [documentation](https://cddleiden.github.io/QSPRPred/docs/use.html) are still work in progress, and we will be happy for any contributions where it is still lacking.
87
+
88
+
After installation, you will have access to various command line features and you can
89
+
use the Python API directly (
90
+
see [Documentation](https://cddleiden.github.io/QSPRPred/docs/)). For a quick start, you
91
+
can also check out the [Jupyter notebook tutorials](./tutorials/README.md), which
92
+
document the use of the Python API to build different types of models. The tutorials as
93
+
well as the documentation are still work in progress, and we will be happy for any
94
+
contributions where it is still lacking.
95
+
96
+
To use the commandline to train the same QSAR model as in the tutorial use (run from
97
+
tutorial folder):
98
+
99
+
```bash
100
+
python -m qsprpred.data_CLI -i ./data/parkinsons_pivot.tsv -o qspr/data -pr GABAAalpha -pr NMDA -r true -sp random -sf 0.15 -fe Morgan
0 commit comments