Skip to content
mihailefter edited this page May 24, 2019 · 1 revision

Mutalyzer Name Checker Help

The Mutalyzer Name Checker can check the correctness of a variant Description under the following conditions:

  1. The Syntax Checker is able to parse the description.
  2. A valid Reference Sequence record is provided.
  3. The reference sequence record contains all the sequence affected by the variant description.
  4. The reference sequence record annotation contains sufficient information to support the selected position numbering scheme.
  5. The semantic nomenclature rules applicable to the variant description are supported by the Name Checker.

Variant description format

If you are not familiar with the HGVS standard human sequence variant nomenclature, try the Name Generator first or check Variant Descriptions.

The Name Checker expects sequence variant descriptions in the following format:

<accession number>.<version number>:<sequence type>.<variant>

Example

  • NM_003002.1:c.5delC
  • AL449423.14:g.61866_85191del

If the reference sequence record contains multiple genes, transcript variants or protein isoforms and position numbering becomes ambiguous, this format is extended to:

<accession number>.<version number><(Gene Symbol)>:<sequence type>.<variant>

The gene symbol has to be extended with transcript variant or protein isoform numbers (e.g., _v001 or _i001, respectively), if multiple transcript variants or protein isoforms are annotated.

Example

The genomic description AL449423.14:g.61866_85191del is equivalent to the following unambiguous descriptions:

  • 8 descriptions relative to CDKN2A transcript variants:

    • AL449423.14(CDKN2A_v001):c.-271-u19352_234del
    • AL449423.14(CDKN2A_v002):c.5_400del
    • AL449423.14(CDKN2A_v003):n.1-u19623_508del
    • AL449423.14(CDKN2A_v004):n.42_437del
    • AL449423.14(CDKN2A_v005):n.449+371_705del
    • AL449423.14(CDKN2A_v006):n.481+371_565del
    • AL449423.14(CDKN2A_v007):n.53+371_859+d18212del
    • AL449423.14(CDKN2A_v008):n.1-u23242_84del
  • 1 description relative to an MTAP transcript variant:

    • AL449423.14(MTAP_v005):n.*60994-u23670_*60994-u345del
  • 2 descriptions relative to CDKN2B transcript variants:

    • AL449423.14(CDKN2B_v001):c.*3084+d8453_*3084+d31778del
    • AL449423.14(CDKN2B_v002):c.*303+d11537_*303+d34862del
  • 1 description relative to a C9orf53 transcript variant:

  • AL449423.14(C9orf53_v001):c.*312+d3374_*312+d26699del

Name Checker output

The Name Checker will try to regenerate the variant sequence and apply the semantic rules of the HGVS standard human sequence variant nomenclature to name it accordingly.

The Mutalyzer Name Checker has been designed to issue warnings, when correcting entries, encountering inconsistencies, incomplete sequences or annotation, or identifying variations with potential effects on splicing before presenting the results of the analysis. Errors will be generated when the entries can not be processed properly (see the conditions mentioned above).

Click the link below for a Name Checker output example:

General output items

Within the input box:

  • The submitted description

Warnings and errors

See common observed errors.

Overview of the raw variants

  • Top sequence: part of the reference sequence affected by the variant with 25 nucleotide upstream and downstream flanking sequences in 5' to 3' orientation
  • Bottom sequence: the variant sequence with 25 nucleotide upstream and downstream flanking sequences in 5' to 3' orientation

The raw variant description shows the variation type and the position of the variant from the start of the reference sequence.

  • The "View original variant in UCSC Genome Browser" link. Click this link to see the Mutalyzer custom variant track in the UCSC Genome Browser. Please note that the Base Position track displayed as Full will show amino acid codons for the forward orientation of the chromosomal reference sequence, whereas the codon affected might be on the reverse strand.

Genomic description

The genomic description of the variant using the reference sequence specified (only shown for genomic sequence records). If the reference sequence annotation contains mapping to a chromosomal reference sequence, the corresponding description will be listed under the heading: Alternative chromosomal position

Description relative to transcription start

Only shown for transcript sequence records. '''(Not for use in LSDBs in case of protein-coding transcripts). '''The description of the variant using the non-coding transcript position numbering. The link should not be used in combination with protein-coding transcripts, since the n. position will be interpreted as a c. position!

Affected transcripts

Lists all descriptions relative to transcript variants of genes affected by the variant. Descriptions are no predictions of variant effects at the RNA level

Note: Substitution descriptions for genes transcribed in the opposite orientation will use the reverse complement of nucleotides shown in the genomic description. Positions of insertion and deletions in those transcripts can shift in opposite directions due to the Position shift rule: According to the standard nomenclature a deletion of a G in a stretch of G's is described using the position of the most 3' G.

Affected proteins

Lists all descriptions relative to protein isoforms of genes affected by the variant. The protein variant descriptions following the p. prefix are shown between parentheses to indicate that they are predictions. The descriptions are generated by translation of the variant coding sequence under the simple assumption that the annotated splice sites unaffected by the variant are still used.

Detailed information about the selected transcript and predicted protein

Only displayed when descriptions relative to a specific transcript or protein are checked.

Reference protein

Reference protein sequence in single letter amino acid code. Amino acids affected by the variant are shown in red.

Protein predicted from variant coding sequence

Predicted variant protein sequence in single letter amino acid code Amino acids not present in the reference protein are shown in red

Additional information about the transcript

Exon information

Transcript information extracted from the reference sequence annotation presented in tabular format Lists all exons of the transcript with their corresponding numbers, genomic (g.) start and end positions and coding DNA (c.) start and end positions.

CDS information

Lists the Coding sequence (CDS) start and end positions extracted from the reference sequence annotation.

Effects on Restriction sites

Lists all restriction sites, which are created or deleted by the variant. Restriction sites are identified in the sequence using Biopython. The list is created by comparison of restriction sites present in the reference sequence and the variant sequence.

Legend

Lists all genes, transcript variants and protein isoforms extracted from the reference sequence annotation and the method to link them to each other.

Links

Allows the user to download the reference sequence file.

Test Examples

AB026906.1:c.3_4insG

AB026906.1:c.[1del;4G>T]

AL449423.14(CDKN2A_v1):c.1_10del

UD_127955523176(DMD_v002):c.136G>T

LRG_1t1:c.266G>T

Clone this wiki locally