Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation or reference for missing models #78

Closed
vrkosk opened this issue Jul 30, 2024 · 2 comments
Closed

Documentation or reference for missing models #78

vrkosk opened this issue Jul 30, 2024 · 2 comments

Comments

@vrkosk
Copy link

vrkosk commented Jul 30, 2024

I can map most of the model names from DeepLCModels to the supplementary table 2 in the 2021 publication. A couple gaps are filled by issue #77 (thanks!).

I cannot find any information about these models, which were added after the publication:

full_hc_PXD008783_median_calibrate
full_hc_TMTpro_train_msv000088167_median
full_hc_mod_deeplc_train_filtered
full_hc_multretra_train
full_hc_phospho_kai_li
full_hc_tmt_data_consensus_ticnum_filtered

The PRIDE project gives some clues, of course, but was the data set on MassIVE ever published?

@RobbinBouwmeester
Copy link
Member

Hi,

Yes, msv000088167 has been published: https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=2f82c5f336a441d7a7aee378d84f7a58

With regards to the other models, these were mostly trained on internal data. I did make the models public, as they could be especially useful for TMT (full_hc_tmt_data_consensus_ticnum_filtered), phosphopeptides (full_hc_phospho_kai_li), or modifications in general (full_hc_mod_deeplc_train_filtered). I am unfortunately unable to give you a timeline on when this data is publicly available.

With regards to multreta, that was an experimental run where the model was iteratively trained on a large number of datasets. Each dataset was considered as an seperate entity and only trained on for a couple of epochs before switching to a new dataset. Although I cannot give any guarantees, it seems this model actually performs very well across a large number of datasets.

Hope that helps :),

Robbin

@vrkosk
Copy link
Author

vrkosk commented Jul 31, 2024

Yes, that's useful. I think this should be highlighted in the DeepLCModels README.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants