Skip to content

Classifies proteoform identifications and validates that input is transparent about ambiguity

License

Notifications You must be signed in to change notification settings

smith-chem-wisc/ProteoformClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProteoformClassifier

Classifies proteoform identifications and validates that proteoform results are transparent about ambiguity. Check out the wiki page for software details!

Quick start guide

  1. Download the most recent version of ProteoformClassifier here. "GUI.zip" contains a helpful user-interface. If you're too cool for user-interfaces, you can instead download "CMD.zip" to run ProteoformClassifier from the command line. This quick start guide will focus on GUI.zip.

  2. Unzip "GUI.zip" and run "GUI.exe". You may be asked to install .Net

  1. On the left side of the new GUI screen, click on "Validate Software". This page allows you to validate if your proteoform identification software reports ambiguous proteoform identifications.

  1. OPTIONAL: To validate your top-down search software, analyze the test file Validation.mzML with your identification software using the protein database Validation.fasta and the parameters described in "README_Parameters.txt". These files can be downloaded by right-clicking on them and selecting "Save link as...".

  2. If you skipped step #4, download the formatted "Results.tsv" produced from MetaMorpheus output. OPTIONAL: If you analyzed the Validation.mzML data file with your identification software in step #4, format your search output into a .tsv file containing the scan number, proteoform sequence(s), and gene(s) of origin for each PrSM.

  3. Drag and drop the formatted results ("Results.tsv") onto the GUI window. Press "Validate Software". If the output is transparent about ambiguity, then the Output Terminal at the bottom of the page will say "Success!". If not, it will provide error messages to assist you in determining which PrSMs and ambiguities are missing.

  1. After validating that your software is transparent about reporting ambiguity, you can classify proteoform output from that software using the "Classify PrSMs" tab on the left side of the window. This module should appear similar to the "Validate Software" tab, but it accepts multiple input files and writes classified proteoform result files. NOTE: If you do not switch to the Classify PrSMs tab and attempt to run full dataset results on the Validate Software tab, you will get many useless error messages. The Validate Software tab is only for the Validation.mzML data.

  1. The "Classify PrSMs" workflow produces two different output files: "ClassifiedResults" and "ClassifiedSummary".

  2. "ClassifiedResults" contains all of the original input data and adds an additional fourth column containing the classification level for each PrSM.

  1. "ClassifiedSummary" contains a brief synopsis of how many PrSMs were identified at each level.

Thanks for trying out ProteoformClassifier! If you have any questions, please check out our wiki or open an issue and we'll get back to you ASAP.

About

Classifies proteoform identifications and validates that input is transparent about ambiguity

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages