Skip to content

Commit

Permalink
variantRequestParameters.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mbaudis committed Mar 18, 2024
1 parent 24f49f6 commit 95d09de
Show file tree
Hide file tree
Showing 3 changed files with 213 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/query-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ a positional translation is being applied.
### _Aminoacid Change Queries_: `AminoacidRequest`[^1]

Annotated variants can potentiallyqueried using the single amino acid replacement
format. The `aminoacidChange` parameter may[^2] be combined with e.g. a `geneId` to increase
format. The `aminoacidChange` parameter may (must?[^2]) be combined with e.g. a `geneId` to increase
specificity


Expand Down
211 changes: 211 additions & 0 deletions docs/query-types/variantRequestParameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Variation Query Parameters

!!! note "Current Beacon variation query parameters"

The parameters currently implemented can be looked up in the Beacon v2
default model'e [`genomicVariations/requestParameters`](https://github.com/ga4gh-beacon/beacon-v2/blob/main/models/src/beacon-v2-default-model/genomicVariations/requestParameters.yaml).

## Beacon v2 Variant Request Parameters

### `assemblyId`

In the original Beacon v2 specification `assemblyId` parameter refers to the

genomic assembly accession and version as RefSqeq assembly accession (e.g. "GCF_000001405.39")
or a versioned assembly name or synonym such as UCSC Genome Browser assembly (e.g. "hg38")
or Genome Reference Consortium Human (e.g. GRCh38.p13") names.

#### Scouts TODO

* Revise / tighten these definitions since they are highly polymorphic; at least with
a clearly documented preference?
* Abandon the parameter in lieu of versioned `referenceName` values? Currently both
are allowed.


### `referenceName`

The `referenceName` parameter matches the

sequence id for genomic sequence (e.g. chromosome) in which variant coordinates
(`start`, `end` ...) are given. Preferably a RefSeqId or alternatively common
synonymus or aliases.

#### Examples

- "refseq:NC_000009.12"
- NC_000009.12
- Chr9
- "9"
- NC_012920.1

#### Scouts TODO

* More stringent ID use?
* New explicit parameter?

### `mateName`

The parameter is used for a sequence id as in the `referenceName` definitions,
for use cases describing a fusion event (to a different or identical chromosome).
While the parameter had been defined in the Beacon v2 default model so far there
had been no good definition/documentation of use cases (though intuitively there
are many practical cases for translocation/fusion events).

#### Scouts TODO

* document use cases
* define query prototype(s)

### `start`

Precise or fuzzy start coordinate position(s) for a variation locus (0-based, inclusive).
The use depends on the query type:

* `start` only:
- for single positions, e.g. the start of a specified sequence
alteration where the size is given through the specified `alternateBases`
- typical use are queries for SNV and small InDels
- the use of `start` without an `end` parameter requires the use of
`alternateBases`
* `start` and `end`:
- for searching any variant falling fully or partially within the range
between `start` and `end` (a.k.a. "range query")
- additional use of `variantType` OR `alternateBases` can limit the
scope of the query
- by convention, partial overlaps of variants with the indicated genomic
range are accepted; for specific overlap requirements the 4-parameter
"Bracket Queries" should be employed
* 2 values in both `start` and `end` for constructing a "Bracket Query":
- can be used to match any contiguous genomic interval, e.g. for querying
imprecise positions
- identifies all structural variants starting between `start[0]` and `start[1]`,
and ending between `end[0]` <-> `end[1]`
- single or double sided precise matches can be achieved by setting
`start[1]=start[0]+1` and `end[1]=end[0]+1`

#### Scouts TODO

* de-convolute documentation, _i.e._ move specifics to the query type definitions

### `end`

Precise or fuzzy end coordinate position(s) for a variation locus (0-based, inclusive).
This is commonly used for variations w/o specified sequence (although e.g. a range
and a sequence motif could be combined).

### `alternateBases`

Sequence of bases for this variation (starting from `start`).

* Accepted values: [ACGTN]
* N is a wildcard, that denotes the position of any base and can be used as
a standalone base of any type or within a partially known sequence. As example,
a query of `ANNT` the Ns can take take any form of [ACGT] and will match
`ANNT`, `ACNT`, `ACCT`, `ACGT` ... and so forth.
* an _empty value_ is used in the case of deletions with the maximally
trimmed, deleted sequence being indicated in `referenceBases`

#### Scouts TODO

* resolve ambiguity in schema where also the pattern: `^([ACGTUNRYSWKMBDHV\-\.]*)$`
is given
* future use of `sequence` as in VRS?


### `referenceBases`
Sequence of bases which have been replaced by the variation (from `start`). The
use of characters is equivalent to the `alternateBases` parameter.


#### Scouts TODO

* determine if `referenceBases` should be abandoned ...

### `variantType`

The `variantType` is used to query variants which are not defined through a sequence
of one or more bases using the `alternateBases` parameter or through an alternative
query type using non-sequence parameters (e.g. `aminoacidChange`).

The Beacon v2 schema uses some "VCF-like" examples w/o being prescriptive:

Examples here are e.g. structural variants:
* DUP
- increased allelic count of material from the genomic region between
`start` and `end` positions
- no assumption about the placement of the additional sequences is being
made (i.e. no _in situ_ requirement as tandem duplications)
* DEL: deletion of sequence following `start`
* BND: breakend, i.e. termination of the allele at position `start` or in
the `startMin` => `startMax` interval, or fusion of the sequence to distant
partner
Either `alternateBases` or `variantType` is required, with the exception
of range queries (single `start` and `end` parameters).

#### Scouts TODO

* revise regarding the preferential use of CURIEs (_i.e._ EFO or SO classes), as
has been [exemplified in the Beacon v2 documentation](http://docs.genomebeacons.org/variant-queries/#cnv-term-use-comparison-in-computational-fileschema-formats)
for CNVs
* revise/delete notes about parameter combinations which should be moved to the
query type definitions

### `variantMinLength`

* Minimum length in bases of a genomic variant
* This is an optional parameter without prescribed use. While a length is commonly
available for structural variants such as copy number variations, it is recommended
that length based queries should also be supported for variants with indicated
eferenceBases and alternateBases, to enable length-specific wildcard queries.

#### Scouts TODO

* check definition

### `variantMaxLength`

* Maximum length in bases of a genomic variant.
* otherwise as above

### `geneId`

* A gene identifier
* It is strongly suggested to use a symbol following the HGNC (https://www.genenames.org)
nomenclature.

#### Examples

* BRAF
* SCN5A

#### Scouts TODO

* check definition

### `aminoacidChange`

An aminoacid alteration in 1 letter format.

#### Examples

* V600E
* M734V

#### Scouts TODO

* check definition

### `genomicAlleleShortForm`

A genomic HGVSId descriptor.

#### Examples

* NM_004006.2:c.4375C>T

#### Scouts TODO

* expand definition and examples

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ schemas_path: /blob/main/schemas/
nav:
- 'Home': /
- Variation Types: variant-types
- Request PArameters: query-types/variantRequestParameters
- Query Types: query-types
- Query Schema Source Files: https://github.com/ga4gh-beacon/variant-query-types/tree/main/schemas

Expand Down

0 comments on commit 95d09de

Please sign in to comment.