Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fusion query with donor and acceptor parameters #1

Open
costero-e opened this issue Jul 10, 2024 · 3 comments
Open

fusion query with donor and acceptor parameters #1

costero-e opened this issue Jul 10, 2024 · 3 comments

Comments

@costero-e
Copy link
Collaborator

As beacon is still falling in short to make comprehensive fusion queries, me and my colleagues have found a possible solution which I present here to be discussed for the Variants Scout.
The proposal is based on the usual language used in bioinformatics to refer to translocated genes, which add both the initial and the transferred chromosomes and the starting and ending positions for both chromosomes.
I have found that nomenclature is usually like this:
CHR:donorStart-donorEnd:CHR:acceptorStart-acceptorEnd
Having said so, in our opinion, beacon should have brand new names for these donor and acceptor position parameters but could reuse referenceName and mateName for the origin and end chromosomes, respectively.
A definitive beacon query could look like the one I am adding next:
g_variants?referenceName=11&mateName=12&donorStart=[16086165,16086170]&donorEnd=[16086171,16086175]&acceptorStart=[16090071,16090073]&acceptorEnd=[16090074,16090075]
We would like to have new parameters like donorStart or acceptorStart because of not having a misuse of the original start and end parameters, which are conventional and could make things more confused for implementers as well as for beacon users and clients.
Let me know what you think.
Best,
Oriol

@mbaudis
Copy link
Member

mbaudis commented Jul 11, 2024

@costero-e I appreciate this as a push to advance w/ fusions but IMO this adds unnecessary complexity/arguments when talking about the query:

  • a fusion consists of 2 partners
  • the fusion partners might be breakpoints on different or the same chromosome
  • the positions on the given reference sequences (chromosomes...) are frequently "fuzzy" (cytobands, empirical)

For the detection of the 2 fusion partners this leads to the basic requirements of querying 2 chromosomes with associated ranges. This already can be achieved with the current parameters:

  • referenceName + start[0] + start[1]
  • mateName + end[0] + end[1]
  • variantType ... (e.g. SO:0000806 "fusion")
  • conventions
    • mateName >= referenceName in sort order
    • if mateName == referenceName: end[0] > start[0] (which is the usual convention but does not apply if mateNamereferenceName)

In essence this is a BeaconBracketQuery where the end bracket's usually different chromosome is denoted by mateName.

This seems very straightforward with a caveats:

  • it is up to the resource implementers if the combination of these 2 breakpoints with "fusion" type is checked for an explicit joint between the breakpoints; and the Beacon specification should not force a method here
  • we only provide a "single event option", e.g. not combinations for 3-way fusions etc. (over the top, but can be discussed)
  • beacons may/should provide single sided queries but this is like a current range query with "fusion" type so doesn't need a particular solution

@jrambla
Copy link

jrambla commented Jul 13, 2024

Hi!

I generally doesn't recommend using parameters (or columns in a table or alike) for other purposes than the originally envisioned, as, in the midterm, this overloading ends up in issues when evolving any of the two usages: the original or the added one, or makes validations and documentation harder and less intuitive.

Therefore, using the start and end for positions in different chromosomes will not be recommended. Neither adding complex validations on if mateName and referenceNames are higher or lower than the other.
Adding more parameters when they add clarity, as in this case, would be my strong recommendation.

@mbaudis
Copy link
Member

mbaudis commented Jul 13, 2024

@jordi Well, the meaningful parameters one could think about would be mateStart and mateEnd and define a fusion request as a double RangeQuery referenceName + start (single) + end (single) + mateName + mateStart (single) + mateEnd (single) + variantType. This would be a bit more verbose (same number of values but more parameters).

However, the conventions of e.g. mateName >> referenceName are a given since this is how any fusion annotation works; lower chromosome first.

I guess I could support adding those 2 parameters even if they do not provide an additional solution over the currently available ones since they provide clearer labels (indicating the position on the 2nd fusion partner as end isn't really correct).

Re: Oriol's @costero-e suggestion: It doesn't make sense to name something "donor" or "acceptor"; and also we only need 2 positional parameters per fusion partner (to indicate the range of the breakpoint).

And independent of all that we will need to discuss if cytoband queries should be supported by front end / helpers or through the API (i.e. with additional cytoband strings; interestingly VRS allows CytobandInterval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants