Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum feature generator #178

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft

Conversation

ArthurDeclercq
Copy link
Collaborator

No description provided.

@RalfG RalfG added this to the v3.2.0 milestone Aug 20, 2024
@RalfG RalfG added the feature new feature label Aug 20, 2024
(psm_list["qvalue"] <= 0.01)
& (psm_list["rank"] <= max_rank)
& (~psm_list["is_decoy"])
& ([metadata.get("original_psm", True) for metadata in psm_list["metadata"]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might be quite inefficient, however I'm not sure if it can be improved significantly, given that original_psm is in the metadata dict. Maybe keeping it a series instead of a list might be better. Or adding it to the dataframe.

Comment on lines +121 to +124
if original_matched_ions_pct > matched_ions[i]:
keep[i] = False
else:
keep[i] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if original_matched_ions_pct > matched_ions[i]:
keep[i] = False
else:
keep[i] = True
keep[i] = original_matched_ions_pct <= matched_ions[i]

Comment on lines +108 to +111
if "matched_ions_pct" in psm_list[0].rescoring_features:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
else:
return psm_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "matched_ions_pct" in psm_list[0].rescoring_features:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]
else:
return psm_list
if "matched_ions_pct" not in psm_list[0].rescoring_features:
return psm_list
else:
matched_ions = [psm.rescoring_features["matched_ions_pct"] for psm in psm_list]



class MS2FeatureGenerator(FeatureGeneratorBase):
"""DeepLC retention time-based feature generator."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this docstring should be updated?

}
except AttributeError:
raise ParseSpectrumError(
"Could not parse spectrum IDs using ´spectrum_id_pattern´. Please make sure that there is a capturing in the pattern."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a capture group with "a capturing"?

Comment on lines +309 to +319
for peak in annotated_spectrum:

for fragment in peak.annotation:

ion_type = infer_fragment_identity(fragment)

if ion_type == 'b':
b_intensities.append(peak.intensity)
if ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for peak in annotated_spectrum:
for fragment in peak.annotation:
ion_type = infer_fragment_identity(fragment)
if ion_type == 'b':
b_intensities.append(peak.intensity)
if ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities
for peak in annotated_spectrum:
for fragment in peak.annotation:
ion_type = infer_fragment_identity(fragment)
if ion_type == 'b':
b_intensities.append(peak.intensity)
elif ion_type == 'y':
y_intensities.append(peak.intensity)
return b_intensities, y_intensities

return annotated_spectrum.spectrum


def factorial(n):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use a custom function instead of math.factorial?

if spectrum_filepath.suffix.lower() == ".mzml":
return mzml.PreIndexedMzML(str(spectrum_filepath))
elif spectrum_filepath.suffix.lower() == ".mgf":
return mgf.IndexedMGF(str(spectrum_filepath))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to avoid failing silently and add an else and raise an e.g. NotImplementedError or ValueError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants