clarify biological begin/end positions #26

nlwashington · 2015-07-06T22:51:24Z

can you clarify what you mean by the 'biological' begin/end position to use in your model?

there are a couple of ways that could be interpreted from the current documentation:

the begin/end of where transcription starts/ends
the begin/end of the resulting (active) form of the product of transcription and/or translation.

when strandedness is known, option 1 would work for all cases, but option 2 is confusing if you are talking about miRNAs and other RNA products where the active form is the complement, and thus for an RNA-gene on the negative strand means the two-negatives-make-a-positive would need to be applied based on the feature type.

furthermore, when it is a region bound by two BothStrandPosition what is the biological start/end? Is that intended to default to start < end, but either could be valid? Is it then up to the consumer of the data to reconcile that there could be two equivalent regions where the start/ends are switched?

The text was updated successfully, but these errors were encountered:

peterjc · 2015-07-07T08:31:26Z

On the BothStrandPosition question, presumably there is (often?) no meaningful biological start vs end. So I would also have assumed a default to start < end according to the numerical order of the reference sequence.

JervenBolleman · 2015-07-07T08:32:26Z

Good questions

The point here is what are you annotating. Most of the time you should go for 1. However, I think it encourages introducing new predicates. We tend to say that faldo is about locating features on a sequence but in many real ways a miRNA is not located on a DNA genome. The modeling then would be something like this.

<mRNA_Transcription_Region_1> a example:mRNA_Transcription_Region , 
                                  faldo:Region ;
                               faldo:begin [faldo:position 1,
                                            faldo:sequence <a_genome>] ;
                               faldo:end [faldo:position 101, 
                                            faldo:sequence <a_genome>] .

<mRNA_1> example:transcribedFrom <mRNA_Transcription_Region_1> .

So thinking about it ends with option 1 being the natural modeling choice. i.e the unit of annotation will most likely be a transcription region so the begin/end of where transcription starts/ends is natural.

The BothStrandsPosition case where there is a region where the biology is symmetrical the start and end can be mirrored as there is no 'definitive' biological start. Then the default you suggest should be documented as the preferred option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify biological begin/end positions #26

clarify biological begin/end positions #26

nlwashington commented Jul 6, 2015

peterjc commented Jul 7, 2015

JervenBolleman commented Jul 7, 2015

clarify biological begin/end positions #26

clarify biological begin/end positions #26

Comments

nlwashington commented Jul 6, 2015

peterjc commented Jul 7, 2015

JervenBolleman commented Jul 7, 2015