Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Join overlap by ranges as well as metadata #107

Open
shalvichirmade opened this issue Feb 8, 2024 · 0 comments
Open

Comments

@shalvichirmade
Copy link

shalvichirmade commented Feb 8, 2024

Feature request: add an argument to the join_overlap_intersect function that allows additional overlap based on metadata values.

For some context, here is a hypothetical example:
I have two GRanges objects, one for introns and one for transcripts.

## intron GRanges
intron
# GRanges object with 1 range and 2 metadata columns:
#     seqnames       ranges        strand |     type             transcript_id 
#      <Rle>       <IRanges>           <Rle>  |   <factor>         <character>
#      1       100149098-100152384     -       |    intron            ENST00000370137.6


## transcript GRanges
trans
# GRanges object with 2 range and 3 metadata columns:
#     seqnames       ranges             strand |    transcript_name           gene_name
#      <Rle>        <IRanges>              <Rle>  |   <character>                <character>
#        1            100148448-100178256     -       |    ENST00000370137.6            LRRC39
#        1            100133163-100150496     -       |     ENST00000370141.8           TRMT13

I want to join these GRanges objects so I can annotate the intron GRanges with gene_name metadata.

However, when I use join_overlap_left, the range of the intron row overlaps both the rows from trans.

intron <- join_overlap_left(intron, trans)
intron
# GRanges object with 2 range and 3 metadata columns:
#     seqnames       ranges               strand |     type             transcript_id                transcript_name           gene_name
#     <Rle>        <IRanges>        <Rle>  |   <factor>       <character>                  <character>                  <character>
#        1             100149098-100152384     -       |    intron          ENST00000370137.6    ENST00000370137.6   LRRC39
#        1             100149098-100152384     -       |    intron          ENST00000370137.6     ENST00000370141.8   TRMT13

The desired output would only overlap with the trans row corresponding to trans$transcript_name == "ENST00000370137.6".

Here, the overlap should be based on the range as well as the metadata columns:

  • intron$transcript_id
  • trans$transcript_name

R session information

Remember to include your full R session information.

options(width = 120)
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.3
 system   x86_64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Toronto
 date     2024-02-08
 rstudio  2023.06.1+524 Mountain Hydrangea (desktop)
 pandoc   NA
@shalvichirmade shalvichirmade changed the title Join overlap by ranges as well as metadata [FEATURE REQUEST] Join overlap by ranges as well as metadata Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant