Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AlphaFold2_how_to_guide.md #23

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions protein_struct_pred/AlphaFold2_how_to_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ toc: true

To do this there are 3 main sections;

1. [AlphaFold2 - the basics](#alphafold2---the-basics-)
1. [AlphaFold2 - the basics](#alphafold2---the-basics)
2. [Running a Structure Prediction](#running-a-structure-prediction), and
3. [Interpreting the Output Files](#interpreting-the-output-files)

Expand Down Expand Up @@ -121,12 +121,12 @@ The final plot to look at if we are going to understand the structure prediction

![](images/AF2_How-to_images/Image-09.png)

To highlight the interpretation of PAE plots further, let's also look at the *DNA damage checkpoint protein 1* AF output data to see why this plot is important. If we had opened the predicted protein model (on the right) and coloured it by the pLDDT score we would see an image like this:
To highlight the interpretation of PAE plots further, let's also look at the *DNA damage checkpoint protein 1* AF output data to see why this plot is important. If we had opened the predicted protein model (on the left) and coloured it by the pLDDT score we would see an image like this:

![](images/AF2_How-to_images/Image-10.png)


You can see this model is predicted with hgih confidence to contain two structured domains (dark blue) and some low confidence disordered regions (orange). Now if we were just presented with this structure we might be tempted to think that these two domains (one which is at the N-terminus and one near the C-terminus) are forming an intramolecular interaction as AF has packed them very close together. But one quick look at the PAE plot (on the left) will show that there is no data suggesting that these two domains interact together . You can see we get the line down the middle as per usual and then two boxes at either end of the protein indicating the two discrete structured domains. In particular the lack of blue in the top right corner and bottom left of the PAE plots (highlighted by the green boxes) indicates that AF2 has no confidence in the relative positioning of these two domains with respect to each other. Instead the closeness of the two domains is likely an artefact of a volume minimisation function within AF used to increase computational speed, or to put it another way AF2 compresses structures into the smallest volume possible to increase computational efficiency. The correct interpretation of this AF model is that this protein comprises two non-interacting structured domains connected by a long disordered region.
You can see this model is predicted with high confidence to contain two structured domains (dark blue) and some low confidence disordered regions (orange). Now if we were just presented with this structure we might be tempted to think that these two domains (one which is at the N-terminus and one near the C-terminus) are forming an intramolecular interaction as AF has packed them very close together. But one quick look at the PAE plot (on the right) will show that there is no data suggesting that these two domains interact together. You can see we get the line down the middle as per usual and then two boxes at either end of the protein indicating the two discrete structured domains. In particular the lack of blue in the top right corner and bottom left of the PAE plots (highlighted by the green boxes) indicates that AF2 has no confidence in the relative positioning of these two domains with respect to each other. Instead the closeness of the two domains is likely an artefact of a volume minimisation function within AF used to increase computational speed, or to put it another way AF2 compresses structures into the smallest volume possible to increase computational efficiency. The correct interpretation of this AF model is that this protein comprises two non-interacting structured domains connected by a long disordered region.

Below is a table with other proteins and their PAE plots, clicking on the protein name will redirect you to the AlphaFold database which has a great interactive PAE tool. Clicking and dragging a box around an area of the PAE plot will lead to the same area being highlighted in the structure to the left. Where possible I have also included a reference to a paper which has used AlphaFold2 to describe the protein. Note the colours are slightly different with green showing low positional error and white showing high positional error.

Expand All @@ -152,5 +152,5 @@ As we come to the end of this "How-to Guide" there is one final exercise, which

The first thing to notice is that the conservation of this sequence is consistent across the whole trimeric structure. Likewise the pLDDT score is relatively high across all three chains, with the exception of the first \~200 or so aa residues in protein C (residue 450-650 as the residue numbering is continuous). This also means we can be pretty confident about the model that AF2 has produced. In the PAE plot we can see that all three proteins are single domain proteins with the third protein (C) having a short disordered N-terminal region. If we look in the cross boxes we can see that chain A and B are not positioned relative to one another. We can however see that chain A and B are accurately positioned relative to protein C. Closer inspection shows that the front half of the N-terminal disordered region of protein C is accurately positioned with respect to chain A, and that the model is confident of chain A’s positioning toward the C-terminus of protein C. Protein B on the other hand is confidently positioned relative to the N-terminus of protein C but not as confident with the C-terminus.

Alright that's a lot of words, but lets now have a look at the structure and hopefully things will become a little clearer. The data we are looking at is from a protein complex called *Retriever*, where the red protein is protein A and is called VPS29. The green protein is protein B and is called VPS26C, and the orange protein is protein C and is called VPS35L. So we can see that each protein is well ordered and structured with the exception of the N-terminus of VPS35L (orange). We can also see that while protein A and B aren’t positioned relative to each other they are positioned relative to VPS35L. We can see that VPS26C interacts with the N-terminus of VPS35L while VPS29 interacts with the C-terminus. We can also see that a region of the disordered N-terminus of VPS35L makes contact with VPS29 and the C-terminus of VPS35L. Further explanation of this complex and these interactions is available [here](https://www.cell.com/cell/pdf/S0092-8674\(23\)00398-7.pdf).
Alright that's a lot of words, but lets now have a look at the structure and hopefully things will become a little clearer. The data we are looking at is from a protein complex called *Retriever*, where the red protein is protein A and is called VPS29. The green protein is protein B and is called VPS26C, and the orange protein is protein C and is called VPS35L. So we can see that each protein is well ordered and structured with the exception of the N-terminus of VPS35L (orange). We can also see that while protein A and B aren’t positioned relative to each other they are positioned relative to VPS35L. We can see that VPS26C interacts with the N-terminus of VPS35L while VPS29 interacts with the C-terminus. We can also see that a region of the disordered N-terminus of VPS35L makes contact with VPS29 and the C-terminus of VPS35L. Further explanation of this complex and these interactions is available [here](https://doi.org/10.1016/j.cell.2023.04.003).
![](images/AF2_How-to_images/Image-13.png)
Loading