Merge branch 'master' of https://github.com/RobbinBouwmeester/DeepLC

compomics · Feb 13, 2020 · 862e14d · 862e14d
2 parents c8a1b13 + b6897a7
commit 862e14d
Show file tree

Hide file tree

Showing 20 changed files with 328 additions and 89 deletions.
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -6,35 +6,44 @@ on:
     - 'v*'
 
 jobs:
-  deploy:
+  publish:
     runs-on: ubuntu-latest
     steps:
     - uses: actions/checkout@v1
+
     - name: Set up Python
       uses: actions/setup-python@v1
       with:
         python-version: '3.7'
+
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip
         pip install setuptools wheel twine
+
+    - name: Copy models to GUI directory
+      run: |
+        cp -r deeplc/mods deeplc_gui
+
     - name: Zip GUI directory
       uses: thedoctor0/zip-release@master
       with:
         filename: 'deeplc_gui.zip'
         exclusions: '/*src/*'
         path: 'deeplc_gui/*'
-    - name: GitHub Release
+
+    - name: Create GitHub Release
       uses: docker://antonyurchenko/git-release:v1
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        DRAFT_RELEASE: "true"
+        DRAFT_RELEASE: "false"
         PRE_RELEASE: "false"
         CHANGELOG_FILE: "CHANGELOG.md"
       with:
         args: |
           deeplc_gui.zip
-    - name: Build and publish
+
+    - name: Build and publish to PyPI
       env:
         TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
         TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}

diff --git a/.github/workflows/python_package_test.yml b/.github/workflows/python_package_test.yml
@@ -3,7 +3,7 @@ name: Python package test
 on: [push, pull_request]
 
 jobs:
-  build:
+  test:
     runs-on: ${{ matrix.os }}
     strategy:
       max-parallel: 4

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to 
 [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased]
+## [0.1.11] - 2020-02-13
+- Fixes in GUI
+
+## [0.1.10] - 2020-02-10
+- Include less models in package to meet PyPI 60MB size limitation
+
+## [0.1.9] - 2020-02-09
+- Bugfix: Pass custom activation function
+
+## [0.1.8] - 2020-02-07
+- Fixed support for averaging predictions of groups of models (ensemble) when no models were passed
+- New models for ensemble
+
+## [0.1.7] - 2020-02-07
+- Support for averaging predictions of groups of models (ensemble)
+
+## [0.1.6] - 2020-01-21
+- Fix the latest release
+
+## [0.1.5] - 2020-01-21
+- Spaces in paths to files and installation allowed
+- References to other CompOmics tools removed in GUI
 
 ## [0.1.5] - 2020-02-13
 - Fixes in GUI

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,17 +1,17 @@
 # Contributing
 
 This document briefly describes how to contribute to
-[DeepLC](https://github.com/HUPO-PSI/SpectralLibraryFormat).
+[DeepLC](https://github.com/compomics/DeepLC).
 
 ## Before you begin
 
 If you have an idea for a feature, use case to add or an approach for a bugfix,
 it is best to communicate with the community by creating an issue in
-[GitHub issues](https://github.com/HUPO-PSI/SpectralLibraryFormat/issues).
+[GitHub issues](https://github.com/compomics/DeepLC/issues).
 
 ## How to contribute
 
-- Fork [DeepLC](https://github.com/HUPO-PSI/SpectralLibraryFormat) on GitHub to
+- Fork [DeepLC](https://github.com/compomics/DeepLC) on GitHub to
 make your changes.
 - Commit and push your changes to your
 [fork](https://help.github.com/articles/pushing-to-a-remote/).
@@ -28,28 +28,23 @@ with these changes. You pull request message ideally should include:
 
 ## Development workflow
 
-- Main development happens on the `master` branch.
-
 - When a new version is ready to be published:
 
-    1. Merge into the `releases` branch.
-    2. Change the version number in `setup.py` using
+    1. Change the version number in `setup.py` using
     [semantic versioning](https://semver.org/).
-    3. Update the changelog (if not already done) in `CHANGELOG.md` according to
+    2. Update the changelog (if not already done) in `CHANGELOG.md` according to
     [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
-    4. Set a new tag with the version number, e.g. `git tag 0.1.1.dev1`.
-    5. Push to GitHub, with the tag: `git push; git push --tags`.
-    6. Update the version and sha256 checksum in the bioconda recipe using
-    `conda skeleton pypi deeplc` in the
-    [bioconda-recipes](https://github.com/bioconda/bioconda-recipes) repository.
+    3. Set a new tag with the version number, e.g. `git tag v0.1.5`.
+    4. Push to GitHub, with the tag: `git push; git push --tags`.
 
-- When new commits are pushed to the `releases` branch, the following GitHub
-  Actions are triggered:
+- When a new tag is pushed to (or made on) GitHub that matches `v*`, the
+following GitHub Actions are triggered:
 
     1. The Python package is build and published to PyPI.
     2. A zip archive is made of the `./deeplc_gui/` directory, excluding
     `./deeplc_gui/src` with
     [Zip Release](https://github.com/marketplace/actions/zip-release).
-    3. A GitHub release is made with the zipped GUI files as asset and the new
+    3. A GitHub release is made with the zipped GUI files as assets and the new
     changes listed in `CHANGELOG.md` with
     [Git Release](https://github.com/marketplace/actions/git-release).
+    4. After some time, the bioconda package should get updated automatically.
diff --git a/README.md b/README.md
@@ -21,6 +21,7 @@ DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
   - [Python module](#python-module)
 - [Input files](#input-files)
 - [Prediction models](#prediction-models)
+- [Q&A](#qa)
 
 ---
 
@@ -145,3 +146,117 @@ settings:
 By default, DeepLC selects the best model based on the calibration dataset. If
 no calibration is performed, the first default model is selected. Always keep
 note of the used models and the DeepLC version.
+
+## Q&A
+
+**__Q: So DeepLC is able to predict the retention time for any modification?__**
+
+Yes, DeepLC can predict the retention time of any modification. However, if the 
+modification is **very** different from the peptides the model has seen during 
+training the accuracy might not be satisfactory for you. For example, if the model
+has never seen a phosphor atom before, the accuracy of the prediction is going to
+be low.
+
+**__Q: Installation fails. Why?__**
+
+Please make sure to install DeepLC in a path that does not contain spaces. Run
+the latest LTS version of Ubuntu or Windows 10. Make sure you have enough disk 
+space available, surprisingly TensorFlow needs quite a bit of disk space. If
+you are still not able to install DeepLC, please feel free to contact us:
+
+Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
+
+**__Q: I have a special usecase that is not supported. Can you help?__**
+
+Ofcourse, please feel free to contact us:
+
+Robbin.Bouwmeester@ugent.be and Ralf.Gabriels@ugent.be
+
+**__Q: DeepLC runs out of memory. What can I do?__**
+
+You can try to reduce the batch size. DeepLC should be able to run if the batch size is low
+enough, even on machines with only 4 GB of RAM.
+
+**__Q: I have a graphics card, but DeepLC is not using the GPU. Why?__**
+
+For now DeepLC defaults to the CPU instead of the GPU. Clearly, because you want
+to use the GPU, you are a power user :-). If you want to make the most of that expensive
+GPU, you need to change or remove the following line (at the top) in __deeplc.py__:
+
+```
+# Set to force CPU calculations
+os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
+```
+
+Also change the same line in the function __reset_keras()__:
+
+```
+# Set to force CPU calculations
+os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
+```
+
+Either remove the line or change to (where the number indicates the number of GPUs):
+
+```
+# Set to force CPU calculations
+os.environ['CUDA_VISIBLE_DEVICES'] = '1'
+```
+
+**__Q: What modification name should I use?__**
+
+The names from unimod are used. The PSI-MS name is used by default, but the Interim name
+is used as a fall-back if the PSI-MS name is not available. Please also see __unimod_to_formula.csv__
+in the folder __unimod/__ for the naming of specific modifications.
+
+**__Q: I have a modification that is not in unimod. How can I add the modification?__**
+
+In the folder __unimod/__ there is the file __unimod_to_formula.csv__ that can be used to
+add modifications. In the CSV file add a name (**that is unique and not present yet**) and
+the change in atomic composition. For example:
+
+```
+Met->Hse,O,H(-2) C(-1) S(-1)
+```
+
+Make sure to use negative signs for the atoms subtracted.
+
+**__Q: Help, all my predictions are between [0,10]. Why?__**
+
+It is likely you did not use calibration. No problem, but the retention times for training
+purposes were normalized between [0,10]. This means that you probably need to adjust the 
+retention time yourselve after analysis or use a calibration set as the input.
+
+**__Q: How does the ensemble part of DeepLC work?__**
+
+Models within the same directory are grouped if they overlap in their name. The overlap
+has to be in their full name, except for the last part of the name after a "_"-character.
+
+The following models will be grouped:
+
+```
+full_hc_dia_fixed_mods_a.hdf5
+full_hc_dia_fixed_mods_b.hdf5
+```
+
+None of the following models will not be grouped:
+
+```
+full_hc_dia_fixed_mods2_a.hdf5
+full_hc_dia_fixed_mods_b.hdf5
+full_hc_dia_fixed_mods_2_b.hdf5
+```
+
+**__Q: I would like to take the ensemble average of multiple models, even if they are trained on different datasets. How can I do this?__**
+
+Feel free to experiment! Models within the same directory are grouped if they overlap in
+their name. The overlap has to be in their full name, except for the last part of the 
+name after a "_"-character.
+
+The following models will be grouped:
+
+```
+model_dataset1.hdf5
+model_dataset2.hdf5
+```
+
+So you just need to rename you models.
diff --git a/deeplc/__main__.py b/deeplc/__main__.py
@@ -1,7 +1,12 @@
 """
 Code used to run the retention time predictor
 """
+
+__author__ = ["Robbin Bouwmeester", "Ralf Gabriels"]
+__credits__ = ["Robbin Bouwmeester", "Ralf Gabriels", "Prof. Lennart Martens", "Sven Degroeve"]
 __license__ = "Apache License, Version 2.0"
+__maintainer__ = ["Robbin Bouwmeester", "Ralf Gabriels"]
+__email__ = ["Robbin.Bouwmeester@ugent.be", "Ralf.Gabriels@ugent.be"]
 
 # Standard library
 from collections import Counter
@@ -220,12 +225,29 @@ def run(file_pred="",
 
     logging.info("Using DeepLC version %s", __version__)
 
+    if len(file_cal) == 0 and file_model != None:
+        fm_dict = {}
+        sel_group = ""
+        for fm in file_model:
+            if len(sel_group) == 0:
+                sel_group = "_".join(fm.split("_")[:-1])
+                fm_dict[sel_group]= fm
+                continue
+            m_group = "_".join(fm.split("_")[:-1])
+            if m_group == sel_group:
+                fm_dict[m_group] = fm
+        file_model = fm_dict
+
     # Read input files
     df_pred = pd.read_csv(file_pred)
+    if len(df_pred.columns) < 2:
+        df_pred = pd.read_csv(file_pred,sep=" ")
     df_pred = df_pred.fillna("")
 
     if len(file_cal) > 1:
         df_cal = pd.read_csv(file_cal)
+        if len(df_cal.columns) < 2:
+            df_cal = pd.read_csv(df_cal,sep=" ")
         df_cal = df_cal.fillna("")
 
     # Make a feature extraction object; you can skip this if you do not want to