-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peptide Genomic Coordinate issue while applying GalaxyP proteogenomics Tutorial3 #524
Comments
@luisfdez94 @subinamehta Luis, can you share the Database creation history? I want to figure out why there is a SPACE character after the protein ID from the CustomProDB workflow output.
Should have been:
|
Thank you for your fast answer.
There it goes: https://usegalaxy.eu/u/_luisfr/h/peptidegenomiccoordinateissuedbcreation |
@luisfdez94 @subinamehta The protein IDs from customProDB as a SPACE character at the end of the ID. I'm asking @chambm if that seems correct. Most steps in these workflows do not handle an ID ending with a SPACE. We could add a steps after customProDB using regex tools to remove the SPACE. |
@JJ :I think the workflow already takes care of that
…On Mon, Nov 30, 2020 at 10:32 AM Jim Johnson ***@***.***> wrote:
@luisfdez94 <https://github.com/luisfdez94> @subinamehta
<https://github.com/subinamehta> The protein IDs from customProDB as a
SPACE character at the end of the ID. I'm asking @chambm
<https://github.com/chambm> if that seems correct. Most steps in these
workflows do not handle an ID ending with a SPACE. We could add a steps
after customProDB using regex tools to remove the SPACE.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#524 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGP3A7LUONHRN3PLBLBCXBDSSO3INANCNFSM4UDQK4ZA>
.
--
*Subina Mehta*
Bioinformatics Researcher
Dept. of Biochemistry, Molecular Biology and Biophysics
University of Minnesota
7-166 MCB
420 Washington Ave SE
Minneapolis, MN 55455
Lab: 612-624-0381
Phone: 612-500-8841
Email: *smehta@umn.edu <smehta@umn.edu>*
*www.galaxyp.org* <http://www.galaxyp.org>
|
Thanks to @jj-umn and Galaxy-P team help, I have been able to solve this issue. I had to do some modifications to the headers of every sequence of the customized DB (fasta) obtained at the end of Galaxy-P Tutorial 1 : Database creation. For that purpose I have used Regex Find And Replace v1.0.0 tool with the parameters shown at the end of the message.
Another important point if you follow Galaxy-P hands on tutorials is to input this modified Custom DB to mz_to_sqlite tool at Tutorial 2: DB search! I was inputting the original custom DB. Regex Find And Replace v1.0.0
|
Galaxy server : usegalaxy.eu
History link: https://usegalaxy.eu/u/_luisfr/h/peptidegenomiccoordinateissue
Tool version : Galaxy Version 0.1.1
While executing Peptide Genomic Coordinate (following this Galaxy-P hands-on tutorial : [Tutorial3 : Novel peptides](https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/proteogenomics-novel-peptide-analysis/tutorial.html). See also Tutorial 1 : Database creation and Tutorial 2 : Database search), with our dataset (human species), the “Peptide Genomic Coordinate” tool returned an empty file.
I went through the source code; executing it locally in a « Debug » mode, I noticed that it didn’t enter in the « if » condition (in line 47), i.e. « coordinates » variable is empty at each iteration. However if I change line 41 : « acc = each[1] » for « acc = each[1].strip() » (trimming the spaces) it works. I noticed that, sometimes, proteins accession number (Ensembl ENSP) in the mz_to_sqlite input file, comes with a char at the end e.g. 'ENSP00000267884_A82P,P124A '. When, in line 44 (line when we do the query to fill « coordinates » variable), we did the matching with another tool’s input « Peptide_Genomic_Coordinate.sqlite », it does not work well because in this file, protein accessions do not contain this space e.g. 'ENSP00000267884_A82P,P124A'.
To help, I uploaded in the history the input files (data #1, #5 and #7) to execute Peptide Genomic Coordinate and the empty file, resulting from the execution on Galaxy of this tool (data #8). Also, I have uploaded the customized database (data #3) and the output file produced when I executed the tool locally with the modifications commented above (data #9 ).
Thank you for your help.
The text was updated successfully, but these errors were encountered: