Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify biological begin/end positions #26

Open
nlwashington opened this issue Jul 6, 2015 · 2 comments
Open

clarify biological begin/end positions #26

nlwashington opened this issue Jul 6, 2015 · 2 comments

Comments

@nlwashington
Copy link

can you clarify what you mean by the 'biological' begin/end position to use in your model?

there are a couple of ways that could be interpreted from the current documentation:

  1. the begin/end of where transcription starts/ends
  2. the begin/end of the resulting (active) form of the product of transcription and/or translation.

when strandedness is known, option 1 would work for all cases, but option 2 is confusing if you are talking about miRNAs and other RNA products where the active form is the complement, and thus for an RNA-gene on the negative strand means the two-negatives-make-a-positive would need to be applied based on the feature type.

furthermore, when it is a region bound by two BothStrandPosition what is the biological start/end? Is that intended to default to start < end, but either could be valid? Is it then up to the consumer of the data to reconcile that there could be two equivalent regions where the start/ends are switched?

@peterjc
Copy link
Member

peterjc commented Jul 7, 2015

On the BothStrandPosition question, presumably there is (often?) no meaningful biological start vs end. So I would also have assumed a default to start < end according to the numerical order of the reference sequence.

@JervenBolleman
Copy link
Collaborator

Good questions

The point here is what are you annotating. Most of the time you should go for 1. However, I think it encourages introducing new predicates. We tend to say that faldo is about locating features on a sequence but in many real ways a miRNA is not located on a DNA genome. The modeling then would be something like this.

<mRNA_Transcription_Region_1> a example:mRNA_Transcription_Region , 
                                  faldo:Region ;
                               faldo:begin [faldo:position 1,
                                            faldo:sequence <a_genome>] ;
                               faldo:end [faldo:position 101, 
                                            faldo:sequence <a_genome>] .

<mRNA_1> example:transcribedFrom <mRNA_Transcription_Region_1> .

So thinking about it ends with option 1 being the natural modeling choice. i.e the unit of annotation will most likely be a transcription region so the begin/end of where transcription starts/ends is natural.

The BothStrandsPosition case where there is a region where the biology is symmetrical the start and end can be mirrored as there is no 'definitive' biological start. Then the default you suggest should be documented as the preferred option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants