Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the output of MoDec #1

Open
winnieWei123456 opened this issue Feb 27, 2024 · 4 comments
Open

Question on the output of MoDec #1

winnieWei123456 opened this issue Feb 27, 2024 · 4 comments
Labels
question Further information is requested

Comments

@winnieWei123456
Copy link

winnieWei123456 commented Feb 27, 2024

Dear developers,

Hello. I have been using MoDec v 1.2 and I would like to know how the "Score" in the output file "Responsibilities/bestPepResp_....txt" is calculated. I couldn't find the calculation method in the original text. Additionally, could you explain the parameter "-S 0, --Salign 0"? I understand that it calculates the binding core offset starting from different positions, and setting it to zero assumes that the central amino acid of the peptide is the center. However, I would like to know how the calculation method for different offsets is related to the weights.

I look forward to your response. Thank you very much.

@jracle85
Copy link
Member

jracle85 commented Mar 1, 2024

Dear Winnie Wei,

This score corresponds to the term
$$\sum_{k=0}^K \sum_{s=-S}^S w_{k,s}\prod_{l=1}^L \frac{\theta_{l,x_{l\oplus s}^n}^k}{f_{x_{l\oplus s}^n}} $$
present in the equation (1) of our paper.

Concerning the -S 0 or --Salign 0, this corresponds indeed to considering a central alignment of the peptides. When you use this, then the equation is done as indicated in above equation and (eq. 1 from the paper), with the sum over the small $s$ being done from $-S$ to $S$ and the $x_{l\oplus s}$ using the "special sum" that is defined in equation N1 of the Supplementary Note from our paper. With -S 1, you would redefine $S = max_n(\lambda^n) - L$ and eq. (1) would have a sum over $s$ from $0$ to $S$, and there would be a simple $x_{l + s}$ term. The idea would be similar for -S 2 where the various values would be counted from the C-terminal of the peptide. This has some implications on the $w_{k,s}$ term because the peptides don't have all the same size. So a peptide of size L for example will only be counted in the $w_{k,0}$ term, while a longer peptides will count in multiple $w_{k,s}$ terms.

Best regards,

Julien

@jracle85 jracle85 added the question Further information is requested label Mar 1, 2024
@winnieWei123456
Copy link
Author

winnieWei123456 commented Mar 12, 2024 via email

@jracle85
Copy link
Member

Hello Winnie Wei,

Good that you use our tool. I'll try answer your questions below

  1. f_i was indeed obtained from our curated MS data; this was obtained in 2 steps: I first estimated the binding core positions with MoDec considering the frequency in our full MS data, and then I recomputed these frequencies from our HLA-II MS data after removing the first 3 AAs, last 3 AAs and the 9 AAs that were predicted to be in the binding core (in order to remove the bias in frequencies due to some AA preferentially seen at binding anchor positions for example). The values used currently are (I didn't update these values based on our more recent dataset, but that wouldn't change much and wouldn't have a big impact):
    f['A'] = 0.0793; f['C'] = 0.00257; f['D'] = 0.06849; f['E'] = 0.09386; f['F'] = 0.02431; f['G'] = 0.07418; f['H'] = 0.02733; f['I'] = 0.03553; f['K'] = 0.08251; f['L'] = 0.05667; f['M'] = 0.01006; f['N'] = 0.04149; f['P'] = 0.0682; f['Q'] = 0.05515; f['R'] = 0.06199; f['S'] = 0.07242; f['T'] = 0.05682; f['V'] = 0.05747; f['W'] = 0.00721; f['Y'] = 0.02443;

  2. I don't see the images attached of the logo, but it doesn't matter. Both approaches give indeed very similar results. In the report from MoDec, I'm using directly the position frequency matrices found in the PWM folder of MoDec results (you can check the file make_logo_report.R). These include indeed the peptide similarity weighting. On the other hand, when I'm showing a final figure of the motif found for one allele after grouping data from multiple samples, I'm using the binding core sequences from MoDec, without weighting peptides by their similarity. (the weighting or not of peptides will only have a very small influence, not much visible on the logos, and also the pseudo count addition won't change much in general if you have many sequences; what is probably more important is that you use the same way of doing for Gibbscluster if you want to compare to them).

  3. and 4. I think that you got the general idea for these two parameters (for theta, it includes the sequence weighting and peptide responsibility from each possible binding cores of the peptide). But I'm sorry, this is too lengthy to derive here. If you want to derive how to compute these values, you should check the equations that I wrote in my paper, in the SI of this paper, and the derivation formulated in Bishop cited there to do the same steps than in Bishop starting from my reformulated equations.

Best regards,

Julien

@winnieWei123456
Copy link
Author

winnieWei123456 commented Mar 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants