Skip to content

Subset-neighbor-search is a method for FDR control from tandem mass spectrometry data, applicable when only a subset of peptides or proteins are of interest

License

Notifications You must be signed in to change notification settings

Noble-Lab/subset-neighbor-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

subset-neighbor-search

Subset-neighbor-search is a method for FDR control from tandem mass spectrometry data, applicable when only a subset of peptides or proteins are of interest.

Usage

You can run subset-neighbor-search with the following workflow.

First you need to generate a list of relevant and irrelevant peptides. Generating a list of peptides can be performed using the tide-index command in Crux. Specifically, you can run the following command separately for both the relevant fasta database and the irrelevant fasta database.

path-to-crux/crux tide-index --peptide-list T name-of-fasta-file.fa

The above command will generate a text file named tide-index.peptides.txt that contains four columns: target sequence, decoy sequence, mass, and protein ID. Once you have two tide-index.peptides.txt you will need to run the pepsim.py script. You can see how to run this script and what the output looks like via the following help message.

python pepsim.py -h

Note that sequences from the first input will show up as the first column of the output and sequences from the second input will show up as the second column of the output. Typically, the first input will be relevant peptides and the second file will be irrelevant peptides. The output is printed to the console so be sure to save it by redirecting the output to a file. For example,

python pepsim.py --mz-thresh 50 file1 file2 >log.txt

Once pepsim.py is complete (note that run time can take a while), the unique set of peptides found in the second column of the output will be considered the set of "neighbor peptides". Concat this set of peptide sequences with the unique set of "relevant sequences" generated by tide-index to form the database that will be used in subset-neighbor-search. Using this new database, perform a database search with your favorite database search engine. Then filter out any PSMs (both target and decoy) that match to a "neighbor peptide". Finally, estimate the FDR on the remaining set of PSMs. Please note that it is important to filter out these neighbor peptide PSMs prior to FDR estimation.

Congrats, you have succesfully run subset-neighbor-search!

Citing

If you use subset-neighbor-search in your work please cite:

https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.1c00483

Dependencies

Subset-neighbor-search requires the following:

  • Python 3
  • pyteomics

About

Subset-neighbor-search is a method for FDR control from tandem mass spectrometry data, applicable when only a subset of peptides or proteins are of interest

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages