-
Notifications
You must be signed in to change notification settings - Fork 1
CSV result
A typical reports
folder contains following CSV files:
All files with prefix database_name
_search_date
, for example the uniprot-ecoli-20171023_2017.12.22
means the data is searched against the uniprot-ecoli-20171023
database, and the search date is 2017.12.22
.
There are mainly four types of result files:
-
uniprot-ecoli-20171023_2017.12.22.csv
with the shortest file name, it contains all unfiltered PSMs, each line with one PSM, it maybe cross-linked, loop-linked, mono-linked, or regular PSM. -
uniprot-ecoli-20171023_2017.12.22.filtered_X_Y.csv
contains filtered results for different peptide types (X) at different level (Y). X can be cross-linked, loop-linked, mono-linked, or regular; Y can be spectra, peptides, or sites. -
uniprot-ecoli-20171023_2017.12.22.precursor_error_distribution.csv
anduniprot-ecoli-20171023_2017.12.22.filtered_precursor_error_distribution.csv
contain precursor errors from unfiltered and filtered PSMs respectively. They are visualized on the web page result, so they can be skipped when reading this page. -
uniprot-ecoli-20171023_2017.12.22.summary.txt
contains summary information about the search, such as the number of identified PSMs, the search time, etc.
uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv
contains all cross-linked PSMs filtered by TDA-FDR and without decoy results, one PSM per line. uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_peptides.csv
and uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_sites.csv
are directly inferred from the uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv
.
There are 21 columns in uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv
:
-
Order
: the order of PSMs, start from 1. -
Title
: the title of this spectrum. If RAW file is used, the scheme of title isRAWName.Scan.Scan.Charge.pParseID.dta
. For exampleRD_pH_8point3_step2.7566.7566.3.0.dta
means the MS2 scan7566
from RAWRD_pH_8point3_step2
, the charge is3
and the pParseID is0
. pParseID is the order of precursor ID extracted from MS1 by pParse, the lower the higher credibility, 0 is the best. For more details about pParse, please see pParse. -
Charge
: the charge of this spectrum. -
Precursor_Mass
: the experimental [MH+] of precursor. -
Peptide
: the peptide sequence of identification.AKLESLVEDLVNR(2)-HMNIKVTR(5)
means peptideAKLESLVEDLVNR
cross-link withHMNIKVTR
in site2
and5
respectively. For mono-linked and loop-linked peptides, there are one or two cross-linked sites on one peptide. -
Peptide_Type
: the peptide type of identification, it can be Cross-Linked, Loop-Linked, Mono-Linked, or Regular/Common. -
Linker
: the cross-linker name identified. For regular results, it isnull
. -
Peptide_Mass
: the theoretical [MH+] of peptide. -
Modifications
: the identified modifications on this peptide. For example,Carbamidomethyl[C](6)
means Carbamidomethyl happens on 6th site, which is a Cysteine. If more than one modification, they are splitted by semicolon.null
means no modifications. -
Evalue
: the E-value for the entire peptide(-pair), the smaller the more confident. -
Score
: the SVM score of this peptide, the smaller the more confident. It is the prime measure for FDR estimation. -
Precursor_Mass_Error(Da)
: precursor mass error in Da. -
Precursor_Mass_Error(ppm)
: precursor mass error in ppm. -
Proteins
: inferred proteins from this peptide. For example,sp|P0A6Y8|DNAK_ECOLI (304)-sp|P0A6Y8|DNAK_ECOLI (299)/
meanssp|P0A6Y8|DNAK_ECOLI
cross-link withsp|P0A6Y8|DNAK_ECOLI
in site304
and299
respectively. If more than one protein pair is inferred, they are splitted by slash. -
Protein_Type
: the protein type of this identification. Whether it is aIntra-protein
orInter-protein
cross-link. For mono-linked, loop-linked, and regular results, it isNone
. -
FileID
: Which RAW file was this PSM identified from? Start from 1. The ID of one RAW file is decided by the order when added. The map of RAW file and FileID is shown in the parameter file. -
LabelID
: the ID of labeling in quantitation. Start from 1. The map of labeling and LabelID is shown in the parameter file. -
Alpha_Matched
: the number of matched fragment ion for alpha peptide. -
Beta_Matched
: the number of matched fragment ion for beta peptide. SupposeAlpha_Num
andBeta_Num
mean the number of peaks matched to alpha peptide and beta peptide respectively. But some peaks may match both alpha and beta peptide, suppose there areShare_Num
shared peaks. Then, the finalAlpha_Matched
=Alpha_Num
-0.5*Share_Num
,Beta_Matched
=Beta_Num
-0.5*Share_Num
. As a result, 1.5 or 0.5 may appear. -
Alpha_Evalue
: the E-value for alpha peptide only, the smaller the more confident. -
Beta_Evalue
: the E-value for beta peptide only, the smaller the more confident.
pLink2 won't calculate three E-values (Evalue
, Alpha_Evalue
, and Beta_Evalue
) by default, in this case, all E-values will be 1. If the Compute E-value
checkbox in Identification panel is selected, pLink2 will calculate three E-values only for PSMs that pass the FDR threshold. For E-value, the smaller the more confident, it is similar to the score in pLink1.
The columns in other 2 levels (peptides, sites) have the same meaning as in spectra level described above.
From the experience of pLink1, PSM with E-value less than 1E-2 or 1E-3 is good. pLink2 uses SVM scores to estimate FDR, as SVM scores are flexible for different datasets, so there is no such a threshold for SVM scores. The Spectrum_Number >=2 or 3 might be a good indicator for a confident cross-linked site. The Spectrum_Number of one cross-linked site means how many PSMs supports the cross-linked site. It can be found in the *.filtered_cross-linked_sites.csv file.
As the unfiltered CSV contains unfiltered PSMs, it contains some additional columns:
-
Peptide_Type
: the same as thePeptide_Type
in spectra level described above, but with 0 for Regular/Common, 1 for Mono-Linked, 2 for Loop-Linked, and 3 for Cross-Linked. -
Refined_Score
: the refined score calculated by KSDP algorithm. -
SVM_Score
: the same as theScore
in spectra level described above. -
Target_Decoy
: the identification is target or decoy. 0 for Decoy-Decoy, 1 for Target-Decoy (or Decoy-Target), and 2 for Target-Target. -
Q-value
: the smoothed FDR value. -
Protein_Type
: the same as theProtein_Type
in spectra level described above, but with 0 for Regular/Common, 1 for Intra-protein, and 2 for Inter-protein.
- Hardware requirement
- Software requirement
- pLink2 activation
- Quick start
- General description
- Web page result
- CSV result
- Parameter file
- Metadata configuration
- Mass spectrum labeling