New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Spectrum feature generator #178

Draft

ArthurDeclercq wants to merge 32 commits into main from spectrum-feature-generator

Collaborator

ArthurDeclercq commented Aug 16, 2024

No description provided.

ArthurDeclercq and others added 17 commits

February 24, 2024 15:48


          initial commit

fdceeba


          finalize ms2 feature generation

5374ed8


          add rustyms

60207a3


          remove exit statement fixed IM required value

ae39844


          change logger.info to debug

9b98c4d


          added profile decorator to get timings for functions

5e45756


          removed profile as standard rescore debug statement

304777c


          added new basic features

95ee475


          fixes for ms2 feature generator, removed multiprocessing

73f4573


          return empty list on parsing error with rustyms, removed multiprocessing

947233e


          add deeplc_calibration psm set

24ce565


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

114b006

…into spectrum-feature-generator


          remove unused import

33c38b0


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

40425c7

…into spectrum-feature-generator


          Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

b810b8c

…into spectrum-feature-generator


          Merge tag 'main' of https://github.com/compomics/ms2rescore into spec…

69b5d1a

…trum-feature-generator


          Merge pull request #177 from compomics/main

6e2d102

pull main in spectrum-feature-generator

RalfG added this to the v3.2.0 milestone

RalfG added the feature label

ArthurDeclercq and others added 11 commits

August 21, 2024 13:25


          integrate mumble into ms2branch

11fdc51


          Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…

3140c44

…rator


          temp removal of sage features before rescoring

883169a


          Merge branch 'main' of https://github.com/compomics/ms2rescore into s…

97865e7

…pectrum-feature-generator


          remove psm_file features when rescoring with mumble

da39ae8


          linting

37fff28


          add hyperscore calculation

e8b59f3


          calibration fixes

c51cd34


          changes for mumble implementation

295e37f


          change openms peptide formatting

909860d


          add mumble psm filtering functionality

c5902c2

ArthurDeclercq and others added 4 commits

November 22, 2024 13:36


          Merge branch 'spectrum-feature-generator' of https://github.com/compo…

6eaceb2

…mics/ms2rescore into spectrum-feature-generator


          remove pyopenms dependency for hyperscore calculation

5ce55f5


          fix spectrum_id accession

986c5f6


          Merge branch 'spectrum-feature-generator' of https://github.com/compo…

bbecf6a

…mics/ms2rescore into spectrum-feature-generator

paretje reviewed

View reviewed changes

ms2rescore/core.py

+                      (psm_list["qvalue"] <= 0.01)
+                      & (psm_list["rank"] <= max_rank)
+                      & (~psm_list["is_decoy"])
+                      & ([metadata.get("original_psm", True) for metadata in psm_list["metadata"]])

Collaborator

paretje Jan 6, 2025

This seems like it might be quite inefficient, however I'm not sure if it can be improved significantly, given that original_psm is in the metadata dict. Maybe keeping it a series instead of a list might be better. Or adding it to the dataframe.

ms2rescore/utils.py

Comment on lines +121 to +124

+                          if original_matched_ions_pct > matched_ions[i]:
+                              keep[i] = False
+                          else:
+                              keep[i] = True

Collaborator

paretje Jan 6, 2025

Suggested change

      
                        if original_matched_ions_pct > matched_ions[i]:
          
                            keep[i] = False
          
                        else:
          
                            keep[i] = True
          
                            keep[i] = original_matched_ions_pct <= matched_ions[i]

ms2rescore/utils.py

Comment on lines +108 to +111

+                  if "matched_ions_pct" in psm_list[0].rescoring_features:
+                      matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
+                  else:
+                      return psm_list

Collaborator

paretje Jan 6, 2025

Suggested change

      
                if "matched_ions_pct" in psm_list[0].rescoring_features:
          
                    matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
          
                else:
          
                    return psm_list
          
                if "matched_ions_pct" not in psm_list[0].rescoring_features:
          
                    return psm_list
          
                else:
          
                    matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]

ms2rescore/feature_generators/ms2.py



		class MS2FeatureGenerator(FeatureGeneratorBase):
		"""DeepLC retention time-based feature generator."""

Collaborator

paretje Jan 6, 2025

I guess this docstring should be updated?

ms2rescore/feature_generators/ms2.py

+                          }
+                      except AttributeError:
+                          raise ParseSpectrumError(
+                              "Could not parse spectrum IDs using ´spectrum_id_pattern´. Please make sure that there is a capturing in the pattern."

Collaborator

paretje Jan 6, 2025

Do you mean a capture group with "a capturing"?

ms2rescore/feature_generators/ms2.py

Comment on lines +309 to +319

+                  for peak in annotated_spectrum:
+                      for fragment in peak.annotation:
+                          ion_type = infer_fragment_identity(fragment)
+                          if ion_type == 'b':
+                              b_intensities.append(peak.intensity)
+                          if ion_type == 'y':
+                              y_intensities.append(peak.intensity)
+                  return b_intensities, y_intensities

Collaborator

paretje Jan 6, 2025

Suggested change

      
                for peak in annotated_spectrum:
          
                    for fragment in peak.annotation:
          
                        ion_type = infer_fragment_identity(fragment)
          
                        if ion_type == 'b':
          
                            b_intensities.append(peak.intensity)
          
                        if ion_type == 'y':
          
                            y_intensities.append(peak.intensity)
          
                return b_intensities, y_intensities
          
                for peak in annotated_spectrum:
          
                    for fragment in peak.annotation:
          
                        ion_type = infer_fragment_identity(fragment)
          
                        if ion_type == 'b':
          
                            b_intensities.append(peak.intensity)
          
                        elif ion_type == 'y':
          
                            y_intensities.append(peak.intensity)
          
                return b_intensities, y_intensities

ms2rescore/feature_generators/ms2.py

		return annotated_spectrum.spectrum


		def factorial(n):

Collaborator

paretje Jan 6, 2025

Any reason to use a custom function instead of math.factorial?

ms2rescore/feature_generators/ms2.py

+                      if spectrum_filepath.suffix.lower() == ".mzml":
+                          return mzml.PreIndexedMzML(str(spectrum_filepath))
+                      elif spectrum_filepath.suffix.lower() == ".mgf":
+                          return mgf.IndexedMGF(str(spectrum_filepath))

Collaborator

paretje Jan 6, 2025

It might be better to avoid failing silently and add an else and raise an e.g. NotImplementedError or ValueError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature