Skip to content

Commit

Permalink
Clarifying edits
Browse files Browse the repository at this point in the history
  • Loading branch information
clzirbel authored Oct 16, 2024
1 parent c625e25 commit 7f53f05
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions help.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,15 @@ After loading the input page, it takes 2 seconds for the Submit button to change
The R3DMCS output page provides query information, a table of instances, a coordinate window, an interactive heat map, and a listing of nearby chains. Each row of the table lists one instance, and shows the PDB id, model number, chain, resolution, nearby chains, nucleotide numbers, and annotated pairwise interactions. The instances are ordered by geometric similarity so that instances that are more similar to each other are placed near one another in the table. The same ordering is used in the heatmap. The heatmap is interactive; clicking the heatmap selects instances, which are then marked in the table and are shown in the coordinate window. These features of the output page are explained in detail in the context of Example 1 below.

### Example 1: *E. coli* small decoding loop
This example illustrates the dynamic nature of the decoding loop. During translation, the decoding loop in helix 44 of the small subunit ribosomal RNA makes contact with the mRNA to promote fidelity of translation. The contact is made by two adenine bases, often numbered 1492 and 1493, flipping out of the internal loop. When the mRNA is not present, the adenine bases typically stack inside the internal loop. We can see several different conformations of the internal loop with R3DMCS. We use internal loop IL_5J7L_060 from E. coli as the query. For this illustration, we use resolution threshold 3.0Å and retrieve corresponding instances across the equivalence class of *E. coli* small subunit ribosomal RNA 3D structures. See the [URL to produce the input page for Example 1](https://rna.bgsu.edu/correspondence/comparison?selection=IL_5J7L_060&resolution=3.0&scope=EC&input_form=True). The query takes around 30 seconds to return results on 149 corresponding instances.
Example 1 illustrates the dynamic nature of the decoding loop. During translation, the decoding loop in helix 44 of the small subunit ribosomal RNA makes contact with the mRNA to promote fidelity of translation. The contact is made by two adenine bases, often numbered 1492 and 1493, flipping out of the internal loop. When the mRNA is not present, the adenine bases typically stack inside the internal loop. We can see several different conformations of the internal loop with R3DMCS. We use internal loop IL_5J7L_060 from E. coli as the query. For this illustration, we use resolution threshold 3.0Å and retrieve corresponding instances across the equivalence class of *E. coli* small subunit ribosomal RNA 3D structures. See the [URL to produce the input page for Example 1](https://rna.bgsu.edu/correspondence/comparison?selection=IL_5J7L_060&resolution=3.0&scope=EC&input_form=True). The query takes around 30 seconds to return results on 149 corresponding instances.

#### Query information panel
The upper left panel of the output page, shown below, shows basic information about the query and the corresponding instances. The query nucleotides come from PDB id 5J7L, model 1, chain AA. The standardized name of that chain is the small subunit ribosomal RNA, SSU for short. The query nucleotides are listed; note that residue 1407 is a modified C. Concatenating the PDB|Model|Chain with the query nucleotide sequence and number would give the full unit id, for example, 5J7L|1|AA|G|1405 for the first nucleotide. The Query Organism identifies the species of the PDB chain the query nucleotides are from. Since we chose to retrieve instances from across the equivalence class, the equivalence class identifier NR_3.0_56726.109 is shown; this indicates that the resolution threshold is 3.0Å and that the equivalence class with handle 56726 is on version 119, meaning that since the inception of this equivalence class, the membership has changed 119 times. This query has retrieved 134 instances, all of which are from *E. coli* small subunit ribosomal RNA 3D structures. In the all-against-all geometric comparison, the largest geometric discrepancy is 1.40, indicating a moderate level of geometric similarity even between the most dissimilar instances.
The upper left panel of the output page, shown below, shows basic information about the query and the corresponding instances. The query nucleotides come from PDB id 5J7L, model 1, chain AA. The standardized name of that chain is the small subunit ribosomal RNA, SSU for short. The query nucleotides are listed; note that residue 1407 is a modified C. Concatenating the PDB|Model|Chain with the query nucleotide sequence and number would give the full unit id, for example, 5J7L|1|AA|G|1405 for the first nucleotide. The Query Organism identifies the species of the PDB chain the query nucleotides are from. Since we chose to retrieve instances from across the equivalence class, the equivalence class identifier NR_3.0_56726.119 is shown. The identifier indicates that the resolution threshold is 3.0Å and that the equivalence class with handle 56726 is on version 119, meaning that since the inception of this equivalence class, the membership has changed 119 times. (R3DMCS always queries the current version of an equivalence class; when this discussion was first written, the version was 119, but by now the version number is well beyond 119.) This query retrieved 134 instances, all of which are from *E. coli* small subunit ribosomal RNA 3D structures. In the all-against-all geometric comparison, the largest geometric discrepancy is 1.40, indicating a moderate level of geometric similarity even between the most dissimilar instances.

![Query information panel](/assets/query_panel.png)

#### Table of instances
The table of instances in the center of the output page lists all 134 instances. In the image below, we show two rows of the table. The query instance is in row 4 and indicates the PDB, model, and chain to be 5J7L, 1, AA. The structure 5J7L was solved at 3.0Å resolution. The columns numbered 1, 2, 3, indicate the query nucleotides, starting with G|1405. The column labeled "Neighboring Protein/NA Chains" indicates chains which have at least one residue within 10Å of one of the nucleotides in the instance on that row. In 5J7L, that includes the LSU rRNA and the SSU protein uS12. The instance in row 52 of the table is from PDB structure 7M5D, solved at 2.8Å. Numbering in *E. coli* 3D structures is quite consistent from one structure to the next, so it is no surprise that the nucleotide numbers in the columns are the same. What differs the most is the nearby chains; in 7M5D three additional chains are nearby, namely Peptide chain release factor RF-1, a tRNA, and an mRNA; their chain identifiers are indicated at the beginning of each line. As we will explain below, the user can visualize residues from these chains in the coordinate window by clicking "Show neighborhood".
The table of instances in the center of the output page lists all 134 instances. In the image below, we show two rows of the table. The query instance is in row 4 and indicates the PDB, model, and chain to be 5J7L, 1, AA. The structure 5J7L was solved at 3.0Å resolution. The columns numbered 1, 2, 3, indicate the query nucleotides, starting with G|1405. The column labeled "Neighboring Protein/NA Chains" indicates chains which have at least one residue within 10Å of one of the nucleotides in the instance on that row. In 5J7L, that includes the LSU rRNA and the SSU protein uS12. The instance in row 52 of the table is from PDB structure 7M5D, solved at 2.8Å. Numbering in *E. coli* 3D structures is quite consistent from one structure to the next, so it is no surprise that the nucleotide numbers in the columns are the same. What differs the most is the nearby chains; in 7M5D three additional chains are nearby, namely Peptide chain release factor RF-1, a tRNA, and an mRNA; their chain identifiers are indicated at the beginning of each line. As we will explain below, the user can visualize residues from these chains in the coordinate window by clicking "Show neighborhood". (Note: over time as more instances are added to the equivalence class, the row in which a specific instance will be listed is likely to change.)

![Table of instances](/assets/image1.png)

Expand All @@ -168,7 +168,7 @@ One can click "Show neighborhood" to show all residues within 10 Ångstroms of t
#### Heatmap
All instances are compared to one another, all against all, and the geometric discrepancy is calculated. Geometric discrepancy is similar to RMSD, but allows for base substitutions. Geometric discrepancy has two contributions: the location error is the minimum RMSD between glycosidic atoms (N1/N9) when superimposed optimally, and orientation error accounts for different orientations of corresponding bases in the two instances. Units are Ångstroms per nucleotide. See [2008 article on FR3D](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837920/) for more details on the calculation.

If there are N instances, the discrepancies form an N by N matrix, which is displayed as a heat map, with dark colors indicating small discrepancy and light (yellow) colors indicating large discrepancy. In order to make the heatmap easier to understand, the N instances are ordered in such a way that similar instances are put next to each other in the ordering. The heuristic for ordering is called tree-penalized path length (tpPL), which is described in detail in a [2023 article on data seriation](https://www.sciencedirect.com/science/article/pii/S037722172200501X). With this ordering and the heatmap coloring, clusters of similar instances become apparent, allowing the user to examine a small number of instances from each cluster to understand.
If there are N instances, the discrepancies form an N by N matrix, which is displayed as a heat map, with dark colors indicating small discrepancy and light (yellow) colors indicating large discrepancy. In order to make the heatmap easier to understand, the N instances are ordered in such a way that similar instances are put near each other in the ordering. The heuristic for ordering is called tree-penalized path length (tpPL), which is described in detail in a [2023 article on data seriation](https://www.sciencedirect.com/science/article/pii/S037722172200501X). With this ordering and the heatmap coloring, clusters of similar instances become apparent, allowing the user to examine a small number of instances from each cluster to understand the variability in the motif.

The heatmap is interactive, in the sense that one can click on the heatmap to select instances, or select instances in the table and see where the instance falls on the heatmap. Briefly,
- Left clicking on the diagonal of the heatmap selects one instance
Expand Down Expand Up @@ -202,7 +202,7 @@ Below, we show one instance from the remaining large cluster of instances. This

![Heatmap6](/assets/heatmap6.png)

Below, we use the colormap to identify a pair of instances with a large geometric discrepancy between them and then click below the diagonal to display them. In one, only A1493 is flipped out, and in the other, only A1492 is flipped out, creating a geometric discrepancy per nucleotide value of 1.1405Å.
Below, we use the colormap to identify a pair of instances with a large geometric discrepancy between them (yellowish cell in the heatmap) and then click below the diagonal to display them. In one, only A1493 is flipped out, and in the other, only A1492 is flipped out, creating a geometric discrepancy per nucleotide value of 1.1405Å.

![Heatmap7](/assets/heatmap7.png)

Expand All @@ -212,15 +212,15 @@ The upper right corner of the output page lists all unique names of nearby chain
![Nearby chains listing](/assets/chains_count.png)

### Example 2: *E. coli* SSU h27 internal loop
This [example](https://rna.bgsu.edu/correspondence/comparison?selection=IL_5AJ3_023&resolution=4.0&scope=Rfam&depth=1&input_form=true) studies an internal loop from the small subunit ribosomal RNA helix 27. The core of the loop is the same as the sarcin-ricin internal loop in Helix 95 of the large subunit ribosomal RNA, consisting of a GUA base triple. This recurrent internal loop motif is also called a G-bulge. This example compares instances of the loop across different species whose SSU chains map to Rfam family RF00177. The query loop is IL_5AJ3_023, which comes from chain A of PDB id 5AJ3, which is a small subunit ribosomal RNA from the mitochondrion of *Sus scrofa*. As it happens, there are other 3D structures of the same molecule from the same species, and 5AJ3|1|A is not the representative of the equivalence class of 3D structures, as we illustrate below by showing the table entry in the [Representative Set page](https://rna.bgsu.edu/rna3dhub/nrlist/release/3.332/4.0A) that contains 5AJ3:
[Example 2](https://rna.bgsu.edu/correspondence/comparison?selection=IL_5AJ3_023&resolution=4.0&scope=Rfam&depth=1&input_form=true) studies an internal loop from the small subunit ribosomal RNA helix 27. The core of the loop is the same as the sarcin-ricin internal loop in Helix 95 of the large subunit ribosomal RNA, consisting of a GUA base triple. This recurrent internal loop motif is also called a G-bulge. This example compares instances of the loop across different species whose SSU chains map to Rfam family RF00177. The query loop is IL_5AJ3_023, which comes from chain A of PDB id 5AJ3, which is a small subunit ribosomal RNA from the mitochondrion of *Sus scrofa*. As it happens, there are other 3D structures of the same molecule from the same species, and 5AJ3|1|A is not the representative of the equivalence class of 3D structures, as we illustrate below by showing the table entry in the [Representative Set page](https://rna.bgsu.edu/rna3dhub/nrlist/release/3.332/4.0A) that contains 5AJ3:

![Representative entry](/assets/ec_example.png)

Note that 6GAZ is the representative structure. Note here that the chains in this equivalence class map to Rfam family RF00177, which Rfam labels as being bacterial SSU, but which mitochondrial and chloroplast ribosomes also match well, due to the ribosomes in those organelles originating from bacteria.

Using 5AJ3 as a starting point in the query, R3DMCS maps its 18 nucleotides to other 3D structures which also map to [Rfam family RF00177](https://rfam.org/family/SSU_rRNA_bacteria). The query has depth=1, so only on structure, the representative structure, from each equivalence class is returned. Thus an instance from 6GAZ appears in the output page, not the query from 5AJ3. The query takes about 8 seconds to return results on 27 corresponding instances.
Using 5AJ3 as a starting point in the query, R3DMCS maps its 18 nucleotides to other 3D structures which also map to [Rfam family RF00177](https://rfam.org/family/SSU_rRNA_bacteria). The query has depth=1, so only one structure, the representative structure, from each equivalence class is returned. Thus an instance from 6GAZ appears in the output page, not the query from 5AJ3. The query takes about 8 seconds to return results on 27 corresponding instances.

This loop is particularly interesting, because the heat map shows four structures that are quite distinct from the rest, see below where we have selected the instance from 6GAZ and the instance from 5J7L, which is from *E. coli* as in previous examples. The key difference is that the four structures in the lower right of the heat map are all mitochondrial ribosomes, in which position 4 in the sequence is C, whereas the other structures all have G in that position. This example shows that when the G in the base triple in the G-bulge changes to C, the base triple is lost, and the C bulges out of the motif. Apparently that is not a problem in some mitochondria, but all bacteria in the 3D structure database have G in that position, and the G participates in the base triple.
This loop is particularly interesting, because the heat map shows four structures that are quite distinct from the rest, see below where we have selected the instance from 6GAZ and the instance from 5J7L, which is from *E. coli* as in previous examples. The key difference is that the four structures in the lower right of the heat map are all mitochondrial ribosomes, in which position 4 in the sequence is C, whereas the other structures all have G in that position. This example shows that when the G in the base triple in the G-bulge changes to C, the base triple is lost, and the C bulges out of the motif. Apparently that is not a problem in some mitochondria, but all bacteria and some mitochondria in the 3D structure database have G in that position, and the G participates in the base triple.

![Example2](/assets/example2.png)

Expand Down

0 comments on commit 7f53f05

Please sign in to comment.