Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: support ProForma delta mass notation for individual modifications #17

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sgibb
Copy link
Member

@sgibb sgibb commented Dec 25, 2024

This PR is WIP and should solve the issues #14 and #15 . Both ask for individual modifications on a specific amino acid in the peptide sequence. I add a very basic HUPO-PSI ProForma parser that only supports the delta mass notation.

Delta mass is given as + or - mass in square brackets behind the amino acid of interest, e.g. "P[+10]QR", or "P[-1.01]QR".

Problems:

  • ProForma delta mass modification will be added to the fixed global modifications given with the argument "modifications".
  • The ProForma specification suggests to use CV links (e.g. unimod - our old, never finished package) and is against the use of the "delta mass" notation.

TODOs:

  • Documentation.
  • Add modification information to the "seq" column in the returned data.frame.
calculateFragments("PQR")
#> Modifications used: C=57.02146
#>          mz ion type pos z seq
#> 1  98.06004  b1    b   1 1   P
#> 2 226.11862  b2    b   2 1  PQ
#> 3 175.11895  y1    y   1 1   R
#> 4 303.17753  y2    y   2 1  QR
#> 5 157.10839 y1_   y_   1 1   R
#> 6 285.16697 y2_   y_   2 1  QR
#> 7 286.15098 y2*   y*   2 1  QR
calculateFragments("P[+10]QR")
#> Modifications used: C=57.02146
#>         mz ion type pos z seq
#> 1 108.0600  b1    b   1 1   P
#> 2 236.1186  b2    b   2 1  PQ
#> 3 175.1190  y1    y   1 1   R
#> 4 303.1775  y2    y   2 1  QR
#> 5 157.1084 y1_   y_   1 1   R
#> 6 285.1670 y2_   y_   2 1  QR
#> 7 286.1510 y2*   y*   2 1  QR

If we decide to support this notation we may discuss to support the ProForma global modification notation "SEQUENCE" as well (a modification in <, > is attached in front of the sequence) (and may remove the "modification" argument?).

@sgibb sgibb added the enhancement New feature or request label Dec 25, 2024
@sgibb sgibb requested review from lgatto and jorainer December 25, 2024 22:50
@sgibb sgibb self-assigned this Dec 25, 2024
@lgatto
Copy link
Member

lgatto commented Dec 26, 2024

Thanks @sgibb for this. I do agree that it would be nice addition, and a good reason to revive (and possibly extend) the unimod package. What would be the best way to proceed?

  • Should we focus on the parser in PSMatch?
  • Revive the unimod package so that UniMod modifications can be used?
  • Extend the unimod package (possibly rename it to Modifications) to handle the PSI-MOD format?
  • Work on a more complete ProForma parser (here or in another package)

@lgatto
Copy link
Member

lgatto commented Dec 26, 2024

Re PSI-MOD, the rols package can be used query the ontology:

> library("rols")
> OlsSearch(q = "H2PO3", ontology = "PSI-MOD", exact = TRUE) |> 
+ olsSearch() |> 
+ as.data.frame()
                                       iri ontology_name ontology_prefix short_form
1 http://purl.obolibrary.org/obo/MOD_01455           mod             MOD  MOD_01455
2 http://purl.obolibrary.org/obo/MOD_01456           mod             MOD  MOD_01456
3 http://purl.obolibrary.org/obo/MOD_00696           mod             MOD  MOD_00696
   description                    label    obo_id  type
1 A protei.... O-phosphorylated residue MOD:01455 class
2 A protei.... N-phosphorylated residue MOD:01456 class
3 A protei....   phosphorylated residue MOD:00696 class

but if we want to go that way, we should store the modification tables in the package.

@sgibb
Copy link
Member Author

sgibb commented Dec 26, 2024

.... What would be the best way to proceed?

* Should we focus on the parser in PSMatch?

* Revive the unimod package so that UniMod modifications can be used?

* Extend the unimod package (possibly rename it to `Modifications`) to handle the PSI-MOD format?

* Work on a more complete ProForma parser (here or in another package)

The unimod was and is a great example of over-engineering. We (I) lost the focus on the KISS principle and never finished this very useful package. To be honest I am not able to motivate myself to start working on finishing unimod/a modification package.
While it would be great to have a Modifications for PSI-MOD or/and a complete ProForma parser I vote for a simple working solution (ignoring the CV for now) that solves many use-cases for our users (as suggested in this PR).

@lgatto
Copy link
Member

lgatto commented Dec 26, 2024

+1 for KISS.

I'll discuss with @guideflandre to what extend supporting more formal modifications would be useful for his work and follow up if deemed worth investing the time. Let's start simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants