Releases: compomics/ms2pip
Releases · compomics/ms2pip
v3.10.0
Added
- Added support for mzML spectrum files (both for evaluating models and for extracting feature vectors).
- New argument
spectrum_id_pattern
: Regular expression pattern to apply to spectrum titles before matching to peptide file entries. - When using MS²PIP as class instance, the resulting
pred_and_emp
dataframe can also be returned (instead of writing to a file) when settingreturn_results
toTrue
. - If requested, retention time prediction with DeepLC is now also enabled if spectrum file is given. This feature was previously only enabled if only a peptide file was given.
Changed
- Improved logging: Use Rich library for logging, show time stamps and message log levels.
- MS²PIP now shows a progress bar instead of a wall of text to display prediction progress.
fasta2speclib
: Improved algorithm for variable modification assignment. Combinatorial explosion from variable modifications is now reduced by setting a maximum of modified residues per peptide, instead of arbitrarily selecting a maximum of potentially modified sites per peptide.- Update README.md (Switch from BadGen to Shields.io).
- Switch to Pyteomics MGF reader.
- Avoid SciPy dependency.
- More optimal use of Numpy in
calc_correlations
. - Remove
poetry.lock
(not used, avoid unneeded Dependabot PRs).
Fixed
- Vastly improved computational speed and reduced memory usage when using XGBoost model files for prediction in combination with providing a spectrum file (XGB prediction step is now moved out of multiprocessing).
- For optimal performance, feature vectors for predictions from XGBoost model files now also uses the traditional
ms2pipC.py
multiprocessing system. fasta2speclib
: Fixed issue where modified versions of peptide were duplicated.spectrum_output
: Various fixes in MSP spectral library file writing for DIA-NN compatibility: Write m/z error of 0.0 for each predicted peak in peak annotation string, ensure modifications in MSPMods
field are sorted by position, useRetentionTime
instead ofRTINSECONDS
in comments field.- Fixed double spectrum_utils entry in requirements.
- Updated
python_requires
to minimal 3.7, following previously updated test grid. - Fix spectrum_utils modification off-by-one bug (had no consequences except for plot annotations).
- Fixes #170
- Fix typo in
write_amino_acid_masses
function name. - Fix missing comma in the setup.py.
Removed
- Removed unsupported Tableau output file option
v3.9.0
New and improved 🚀
- New prediction model for CID-TMT: TMT-labelled peptide spectra acquired on ion trap (trap-type CID), often used for "MultiNotch MS3" (https://dx.doi.org/10.1021/ac502040v) (PR #157)
- Support for Python 3.9 and 3.10; dropped support for end-of-life Python 3.6 (PR #156, fixes #126)
- Support for alternative cleavage rules (digestion enzymes) in
fasta2speclib
(PR #166, fixes #96)
Bugfixes 🐛
- Fixed missing support for XGBoost models in single-prediction mode (PR #157, fixes #155)
- Use oldest-supported-numpy for build in CI testing (PR #157)
Refactoring and minor changes 🔧
- Replaced C models files with their XGBoost counterpart (except for HCD2019 and TMT): Faster compilation, smaller Python package (PR #157)
- Add
model_dir
option to set custom directory for model downloads (CLI, single-prediction CLI, Python API) (PR #169, fixes #165) - Add docstring to
MS2PIP
class and add example toREADME.md
(PR #167, fixes #131) - Relaxed click version requirements (PR #157, fixes #158)
- Removed XGBoost warnings from the CLI output (PR #157)
- Various fasta2speclib improvements (PR #166)
- Add deeplc option to default config
- Suppress tensorflow warnings
- Replace deprecated pandas append with concat
- Add missing
sptm
andgptm
to example config.toml (#167)
New prediction models
Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
---|---|---|---|---|
CID-TMT | v20220104 | [in-house dataset] (72 138) | PXD005890 (69 768) | 0.851085 |
v3.8.0
New and improved 🚀
- New models for non-tryptic peptides and immunopeptides! (PR #137)
Checkout our preprint for more info: https://doi.org/10.1101/2021.11.02.466886 - Support for Windows! Just run
pip install ms2pip
in your Windows terminal, and start predicting. (PR #151)
Bugfixes 🐛
- In DLIB output, a value is now written to the
isDecoy
column. Fixes downstream readout of protein information. (#140, PR #152)
Refactoring and minor changes 🔧
- Implementation of
.xgboost
model files directly is now supported, no dump to C and compilation required. (PR #137)
New prediction models
Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
---|---|---|---|---|
HCD2021 | v20210416 | [Combined dataset] (520 579) | PXD008034 (35 269) | 0.932361 |
Immuno-HCD | v20210316 | [Combined dataset] (460191) | PXD005231 (HLA-I) (46 753) PXD020011 (HLA-II) (23 941) |
0.963736 0.942383 |
v3.7.1
v3.7.0
New:
- Command to predict and plot a single spectrum (PR #136)
Improved:
- fasta2speclib improvements (#135)
- Pass through options from config file to DeepLC (fixes #138)
- Pass
num_cpu
to DeepLC, either from thedeeplc
section in the configuration, or from thenum_cpu
option in the fasta2speclib configuration
Fixed:
3.6.3
New:
- Python 3.9 support (PR #122)
New (also published for 3.6.2):
- bioconda package
- biocontainers docker image
- macOS support (PR #95, PR #127), not yet for Python 3.9 (#126)
Fixed:
- MS²PIP now exits on incorrectly configured or unknown modifications, instead of only showing a warning. (#100, PR #101)
- Parsing of C-terminal modifications from a txt config file was broken in v3.6.2. This is now fixed (PR #109)
- The example fasta2speclib configuration file erroneously contained average mass shifts, which has now been updated to the respective monoisotopic mass shifts. (PR #121)
- If a critical error occurs, MS²PIP now exits with status code 1. (#102, PR #123)
- Supported config file extensions are now described in help message and error message (#125, PR #129)
v3.6.2
Fixed in this release:
- Fixes in logging formatting (#64, #65)
- Use float formatting in CSV output
- Retention time predictions can also be added without writing output to file
- When MS²PIP is running in a daemon process, it will not attempt to use multiprocessing
- Various improvments in match_spectra functionality (e.g. sqldb-backend, output, ...)
- General cleanup of repository (e.g. unused models)
v3.6.1
v3.6.0
New since previous release:
- DeepLC integration! Predict spectral libraries with accurate LC retention time prediction, even for modified peptides. Enable DeepLC with the
-r
flag in MS²PIP or by adding"add_retention_time":true
to thefasta2speclib
configuration. - Additional support for TOML-based configuration files: see config.toml example
- New Skyline
.blib
to PEPREC and MGF converter script in conversion_tools - Various under-the-hood improvements
Includes the following models:
Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
---|---|---|---|---|
HCD | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 |
CID | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 |
iTRAQ | v20190107 | NIST iTRAQ (704 041) | PXD001189 (41 502) | 0.905870 |
iTRAQphospho | v20190107 | NIST iTRAQ phospho (183 383) | PXD001189 (9 088) | 0.843898 |
TMT | v20190107 | Peng Lab TMT Spectral Library (1 185 547) | PXD009495 (36 137) | 0.950460 |
TTOF5600 | v20190107 | PXD000954 (215 713) | PXD001587 (15 111) | 0.746823 |
HCDch2 | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 (+) and 0.644162 (++) |
CIDch2 | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 (+) and 0.813342 (++) |