Releases: OlivierBeq/Papyrus-scripts
Releases · OlivierBeq/Papyrus-scripts
Version 2.0.0
New feature:
The PapyrusDataset
class allows for object-oriented 'pandas-style' querying.
Changes
reader.read_papyrus
: raises an error when trying to load the Papyrus++ set with stereochemistry,preprocess.keep_source
: argumentsource
uses regex matching,preprocess.keep_organism
: argumentorganism
is now case insensitive whengeneric_regex=False
download.download_papyrus
now downloads also the README files
Additions:
preprocess.keep_not_match
: keep unmatched column values.preprocess.keep_not_contains
: keep records whose specified column do not contain the specified valuepreprocess.keep_dissimilar
: keep records whose molecules are not similar to the provided moleculepreprocess.keep_not_substructure
: keep records whose molecules are not substructures of the provided molecule
Full Changelog: 1.0.3...2.0.0
Papyrus-scripts v1.0.3
Bug fixes:
- keep_source now returns an empty dataframe for chunks in which the desired source does not appear
New features:
- qsar and pcm's split_by argument now supports 'custom-cluster' to split training and test sets according to a custom assignment that is not directly specifying train/test (as is the case when its value is 'cutsom').
Papyrus-scripts v1.0.2
- Made download disclaimer and errors due to low disk space more evident
papyrus_scripts.utils.IO.process_data_version
now raises an exception statingPapyrus data not available (did you download it first?)
Papyrus-scripts v1.0.1
The Papyrus++ datasets contained duplicated data wrongly associated to multiple assay types (i.e. Ki, KD, EC50, IC50).
The datasets have been updated and links of this release and of the db-links
branch have updated accordingly.
Papyrus-scripts v1.0
Version 1.0 of the Papyrus-scripts library.
Allows one to:
- download the Papyrus dataset
- convert it from/to XZ to/from GZIP
- match the data to structures of the Protein Data Bank
- create FPSubSim2 (extension of FPSim2) files for similarity and substructure searches
- filter the Papyrus data
- model it with QSAR and PCM models
- remove the data files