240_Lee_Plyranges.Rmd

# 240: Fluent genomic data analysis with plyranges

## Instructor names and contact information

* Stuart Lee (lee.s@wehi.edu.au)
* Michael Lawrence (michafla@gmail.com)

## Workshop Description

In this workshop, we will give an overview of how to perform low-level
analyses of genomic data using the  grammar of genomic data transformation
defined in the plyranges package. We will cover:

- introduction to GRanges
- overview of the core verbs for arithmetic, restriction, and aggregation of GRanges objects
- performing joins between GRanges objects
- designing pipelines to quickly explore data from AnnotationHub
- reading BAM and other file types as GRanges objects

The workshop will be a computer lab, in which the participants will be able to 
ask questions and interact with the instructors.

### Pre-requisites

This workshop is self-contained however familiarity with the following would be useful:

- plyranges [vignette](https://sa-lee.github.io/plyranges/articles/an-introduction.html)
- the GenomicRanges and IRanges packages
- tidyverse approaches to data analysis

### Workshop Participation

Students will work through an Rmarkdown document while the instructors respond 
to any questions they have.

### _R_ / _Bioconductor_ packages used

- plyranges
- airway
- AnnotationHub
- GenomicRanges
- IRanges
- S4Vectors

### Time outline


| Activity                     | Time |
|------------------------------|------|
| Overview of GRanges          | 5m   |
| The plyranges grammar        | 20m  |
| I/O and data pipelines       | 20m  |

## Workshop goals and objectives

### Learning goals

* Understand that GRanges follows tidy data principles
* Apply the plyranges grammar to genomic data analysis

### Learning objectives

* Use AnnotationHub to find and summarise data
* Read files into R as GRanges objects
* Perform coverage analysis 
* Build data pipelines for analysis based on GRanges

## Workshop

```{r setup, echo=FALSE}
knitr::opts_chunk$set(message = FALSE, 
                      warning = FALSE, 
                      fig.width = 5, 
                      fig.height = 3, 
                      fig.align = "center",
                      fig.path = "./Lee_Plyranges/")

```    

## Introduction 

### What is plyranges?

The plyranges package is a domain specific language (DSL) built on top of the 
IRanges and GenomicRanges packages (@Lee2018; @Lawrence2013-wg). 
It is designed to quickly and coherently 
analyse genomic data in the form of GRanges objects (more on those later!) and 
from a wide variety of genomic data file types. For users who are familiar with 
the tidyverse, the grammar that plyranges implements will look familiar but with
a few modifications for genomic specific tasks.

### Why use plyranges?

The grammar that plyranges develops is helpful for reasoning
about genomics data analysis, and provides a way of developing 
short readable analysis pipelines. We have tried to emphasise consistency 
and code readability by following the design principles outlined by @Green1996-qg.

One of the goals of plyranges is to provide an alternative entry point 
to analysing genomics data with Bioconductor, especially for R beginners and 
R users who are more familiar with the tidyverse approach to data analysis. 
As a result, we have de-emphasised the use of more complicated data structures 
provided by core Bioconductor packages that are useful for programming with.

### Who is this workshop for?

This workshop is intended for new users of Bioconductor, users who are
interested to learn about grammar based approaches for data analysis,
and users who are interested in learning how to use R to perform analyses
like those available in the command line packages BEDTools (@Quinlan2010-gc).

If that's you, let's begin!

## Setup

To participate in this workshop you'll need to have R >= 3.5 and install
the plyranges, AnnotationHub, and airway Bioconductor 3.7 packages 
(@R-AnnotationHub; @R-airway). You can achieve this
by installing the BiocManager package from CRAN, loading it then running the
install command:

```{r, eval = FALSE}
install.packages("BiocManager")
library(BiocManager)
install(c("plyranges", "AnnotationHub", "airway"))
```

## What are GRanges objects?


```{r GRanges, echo = FALSE, fig.cap="An illustration of a GRanges data object for a single sample from an RNA-seq experiment. The core components of the object include a seqname column (representing the chromosome), a ranges column which consists of start and end coordinates for a genomic region, and a strand identifier (either positive, negative, or unstranded). Metadata are included as columns to the right of the dotted line as annotations (gene-id) or range level covariates (score).", out.width="\\textwidth"}
knitr::include_graphics("./Lee_Plyranges/GRanges.png")
```


The plyranges package is built on the core Bioconductor data structure
GRanges. It is very similar to the base R data.frame but with appropriate
semantics for a genomics experiment: it has fixed columns
for the chromosome, start and end coordinates, and the strand, along
with an arbitrary set of additional columns, consisting of
measurements or metadata specific to the data type or experiment
(figure \@ref(fig:GRanges)).  

GRanges balances flexibility with formal
constraints, so that it is applicable to virtually any genomic
workflow, while also being semantically rich enough to support
high-level operations on genomic ranges. As a core data structure,
GRanges enables compatibility between plyranges and the rest of
Bioconductor.

Since a GRanges object is similar to a data.frame, we can use plyranges
to construct a GRanges object from a data.frame. We'll start by supposing we 
have a data.frame of genes from the yeast genome:

```{r}
library(plyranges, quietly = TRUE)
genes <- data.frame(seqnames = "VI", 
                    start = c(3322, 3030, 1437,  5066, 6426, 836),
                    end = c(3846, 3338, 2615, 5521, 7565, 1363),
                    strand = c("-", "-", "-", "+", "+", "+"),
                    gene_id=c("YFL064C", "YFL065C", "YFL066C", 
                              "YFL063W", "YFL062W", "YFL067W"),
                    stringsAsFactors = FALSE)
gr <- as_granges(genes)
gr
```

The `as_granges` method takes a data.frame and allows you to quickly convert
it to a GRanges object (and can you can also specify which columns in the 
data.frame correspond to the columns in the GRanges).

A GRanges object follows tidy data principles: it is a rectangular
table corresponding to a single biological context (@Wickham2014-jc). 
Each row contains
a single observation and each column is a variable describing the
observations. In the example above, each row corresponds to a single gene, and
each column contains information about those genes. As GRanges are tidy,
we have constructed plyranges to follow and extend the grammar in the R package
dplyr.

## The Grammar

Here we provide a quick overview of the functions available in plyranges
and illustrate their use with some toy examples (see 
\@ref(appendix) for an overview). In the final section we
provide two worked examples (with exercises) that show you can use
plyranges to explore publicly available genomics data and perform coverage
analysis of BAM files.

### Core verbs

The plyranges grammar is simply a set of verbs that define actions to 
be performed on a GRanges (for a complete list see the appendix). 
Verbs can be composed together using the pipe 
operator, `%>%`, which can be read as 'then'. Here's a simple pipeline: first
we will add two columns, one corresponding to the gene_type and another with
the GC content (which we make up by drawing from a uniform distribution). 
Second we will remove genes if they have a width less than 400bp.

```{r}
set.seed(2018-07-28)
gr2 <- gr %>%
  mutate(gene_type = "ORF",
         gc_content = runif(n())) %>%
  filter(width > 400)
gr2
```

The `mutate()` function is used to add columns, here we've added one column
called gene_type where all values are set to "ORF" (standing for open reading
frame) and another called gc_content with random uniform values. The `n()`
operator returns the number of ranges in GRanges object, but can only
be evaluated inside of one of the plyranges verbs. 

The `filter()` operation returns ranges if the expression evaluates to TRUE. 
Multiple expressions can be composed together and will be evaluated as `&`

```{r}
gr2 %>%
  filter(strand == "+", gc_content > 0.5)
# is the same as using `&`
gr2 %>% 
  filter(strand == "+" & gc_content > 0.5)
# but different from using or '|'
gr2 %>%
  filter(strand == "+" | gc_content > 0.5)
```

Now that we have some measurements over our genes, we are most likely
interested in performing the favourite tasks of a biological data scientist:
taking averages and counting. This is achieved with the `summarise()` verb
which will return a DataFrame object (Why is this the case?).

```{r}
gr2 %>%
  summarise(avg_gc = mean(gc_content), 
            n = n())
```

which isn't very exciting when performed without `summarise()`'s best
friend the `group_by()` operator:

```{r}
gr2 %>%
  group_by(strand) %>%
  summarise(avg_gc = mean(gc_content),
            n = n())

```

The `group_by()` operator causes `plyranges` verbs to behave differently.
Instead of acting on all the ranges in a GRanges object, the verbs act within
each group of ranges defined by the values in the grouping column(s).
The `group_by()` operator does not change the appearance the of a GRanges object
(well for the most part):

```{r}
by_strand <- gr2 %>%
  group_by(strand)
by_strand
```

Now any verb we apply to our grouped GRanges, acts on each partition:

```{r}
by_strand %>%
  filter(n() > 2)
by_strand %>%
  mutate(avg_gc_strand = mean(gc_content))
```

To remove grouping use the `ungroup()` verb:
```{r}
by_strand %>%
  ungroup()
```

Finally, metadata columns can be selected using the `select()` verb:

```{r}
gr2 %>%
  select(gene_id, gene_type)
# is the same as not selecting gc_content
gr2 %>%
  select(-gc_content)
# you can also select by metadata column index
gr2 %>%
  select(1:2)
```

### Verbs specific to GRanges

We have seen how you can perform restriction and aggregation on GRanges,
but what about specific actions for genomics data analysis, like arithmetic,
nearest neighbours or finding overlaps?

#### Arithmetic

Arithmetic operations transform range coordinates, as defined by their
_start_, _end_ and _width_  columns. These three variables are mutually
dependent so we have to take care when modifying them. 
For example, changing the _width_ column needs to change
either the _start_, _end_ or both to preserve integrity of the
GRanges:

```{r}
# by default setting width will fix the starting coordinate
gr %>%
  mutate(width = width + 1)
```

We introduce the `anchor_direction()` operator to clarify these 
modifications. Supported anchor points include the start `anchor_start()`, 
end `anchor_end()` and midpoint `anchor_center()`:

```{r}
gr %>%
  anchor_end() %>%
  mutate(width = width * 2)

gr %>%
  anchor_center() %>%
  mutate(width = width * 2)
```

Note that the anchoring modifier introduces an important departure from the 
IRanges and GenomicRanges packages: by default we ignore the strandedness of 
a GRanges object, and instead we provide verbs that make stranded actions 
explicit. In this case of anchoring we provide the `anchor_3p()` and 
`anchor_5p()` to perform anchoring on the 3' and 5' ends of a range.

The table in the appendix provides a complete list of arithmetic options
available in plyranges.

#### Genomic aggregation

There are two verbs that can be used to aggregate over nearest neighbours:
`reduce_ranges()` and `disjoin_ranges()`.
The reduce verb merges overlapping and neighbouring ranges:
```{r}
gr %>% reduce_ranges()
```
We could find out which genes are overlapping each other by aggregating
over the gene_id column and storing the result in a List column:
```{r}
gr %>% 
  reduce_ranges(gene_id = List(gene_id))
```

The disjoin verb takes the union of end points over all ranges, and results
in an expanded range

```{r}
gr %>% 
  disjoin_ranges(gene_id = List(gene_id))
```

You may have noticed that the resulting range are now unstranded, to take
into account stranded features use the _directed_ prefix.

#### Overlaps  

Another important class of operations for genomics data analysis is finding
overlaps or nearest neighbours. Here's where we will 
introduce the `plyranges` join operators and the overlap aggregation 
operators.

Now let's now suppose we have some additional measurements that we have 
obtained from a new experiment on yeast. These measurements are for three different
replicates and represent single nucleotide or insertion deletion intensities 
from an array. Our
collaborator has given us three different data.frames with the data but they
all have names inconsistent with the GRanges data structure. Our goal is to 
unify these into a single GRanges object with column for each measurement
and column for the sample identifier.

Here's our data:

```{r}
set.seed(66+105+111+99+49+56)

pos <- sample(1:10000, size = 100)
size <- sample(1:3, size = 100, replace = TRUE)
rep1 <- data.frame(chr = "VI", 
                  pos = pos,
                  size = size,
                  X = rnorm(100, mean = 2),
                  Y = rnorm(100, mean = 1))

rep2 <- data.frame(chrom = "VI", 
                  st = pos,
                  width = size,
                  X = rnorm(100, mean = 0.5, sd = 3),
                  Y = rnorm(100, sd = 2))

rep3 <- data.frame(chromosome = "VI", 
                  start = pos,
                  width = size,
                  X = rnorm(100, mean = 2, sd = 3),
                  Y = rnorm(100, mean = 4, sd = 0.5))
```

For each replicate we want to construct a GRanges object:

```{r}
# we can tell as_granges which columns in the data.frame
# are the seqnames, and range coordinates
rep1 <- as_granges(rep1, seqnames = chr, start = pos, width = size)
rep2 <- as_granges(rep2, seqnames = chrom, start = st)
rep3 <- as_granges(rep3, seqnames = chromosome)
```

And to construct our final GRanges object we can bind all our replicates 
together:

```{r}
intensities <- bind_ranges(rep1, rep2, rep3, .id = "replicate")
# sort by the starting coordinate
arrange(intensities, start)
```

Now we would like to filter our positions if they overlap one of the genes
we have, one way we could achieve this is with the `filter_by_overlaps()`
operator:

```{r}
olap <- filter_by_overlaps(intensities, gr)
olap
```

Another option would be to perform a join operation. A join
acts on two GRanges objects, a query and a subject. The join operator
retains the metadata from the query and subject
ranges. All join operators generate a set of hits based on overlap or 
proximity of ranges and use those hits to merge the two datasets in different
ways (figure \ref{fig:olaps-fig}, provides an overview of the overlap joins). 
We can further restrict the matching by whether the query is completely _within_
the subject, and adding the _directed_ suffix ensures that 
matching ranges have the same direction (strand).

```{r olaps-fig, echo = FALSE, out.width="\\textwidth", fig.cap="Illustration of the three overlap join operators. Each join takes query and subject range as input (black and light gray rectangles, respectively). An index for the join is computed, returning a Hits object, which contains the indices of where the subject overlaps the query range. This index is used to expand the query ranges by where it was 'hit' by the subject ranges. The join semantics alter what is returned: for an \\textbf{inner} join the query range is returned for each match, for an \\textbf{intersect} the intersection is taken between overlapping ranges, and for a \\textbf{left} join all query ranges are returned even if the subject range does not overlap them. This principle is gnerally applied through the plyranges package for both overlaps and nearest neighbour operations."}
knitr::include_graphics("./Lee_Plyranges/olap.png")
```

Going back to our example, the overlap inner join will return all intensities
that overlap a gene and propagate which gene_id a given intensity belongs to:
```{r}
olap <- join_overlap_inner(intensities, gr)
```

If we wanted to return all intensities regardless of overlap we could use
the left join, and  a missing value will be propagated to the gene_id
column if there isn't any overlap.

```{r}
join_overlap_left(intensities, gr)
```

If we are interested in finding how much overlap there is we could use 
an intersect join. For example we could compute the fraction of overlap 
of the genes with themselves:

```{r}
gr %>%
  mutate(gene_length = width) %>% 
  join_overlap_intersect(gr, suffix = c(".query", ".subject")) %>% 
  filter(gene_id.query != gene_id.subject) %>%
  mutate(folap = width / gene_length)
```
### Exercises

There are of course many more functions available in plyranges but hopefully
the previous examples are sufficient to give you an idea of how to incorporate
plyranges into your genomic workflows.

Here's a few exercises to help you familiar with the verbs:

1. Find the average intensity of the X and Y measurements for each each replicate
over all positions in the intensities object
1. Add a new column to the intensities object that is the distance from
each position to its closest gene (hint IRanges::distance)
1. Find flanking regions downstream of the genes in gr that have width of 8bp
(hint: type: flank_ and see what comes up!)
1. Are any of the intensities positions within the flanking region? (use an
overlap join to find out!)

## Data import and creating pipelines

Here we provide some realistic examples of using plyranges to import genomic
data and perform exploratory analyses.

### Worked example: exploring BigWig files from AnnotationHub

In the workflow of ChIP-seq data analysis, we are often interested in 
finding peaks from islands of coverage over a chromosome. Here we will use plyranges
to explore ChiP-seq data from the Human Epigenome Roadmap project @Roadmap-Epigenomics-Consortium2015-pr.

#### Extracting data from AnnotationHub

This data is available on Bioconductor's AnnotationHub. First we construct
an AnnotationHub, and then `query()` for all bigWigFiles related to 
the project that correspond to the following conditions:

1. are from methylation marks (H3K4ME in the title)
2. correspond to primary T CD8+ memory cells from peripheral blood
3. correspond to unimputed log10 P-values 

First we construct a hub that contains all references to the EpigenomeRoadMap
data and extract the metadata as a data.frame:

```{r}
library(AnnotationHub)
library(magrittr)
ah <- AnnotationHub()
roadmap_hub <- ah %>% 
  query("EpigenomeRoadMap") 

# extract2 is just a call to `[[` 
metadata <- ah %>% 
  query("Metadata") %>% 
  extract2(names(.))

head(metadata)
```

To find out the name of the sample corresponding to 
primary memory T-cells we can filter the data.frame. We extract the 
sample ID corresponding to our filter.

```{r}
primary_tcells <-  metadata %>% 
  filter(ANATOMY == "BLOOD") %>% 
  filter(TYPE == "PrimaryCell") %>% 
  filter(EDACC_NAME == "CD8_Memory_Primary_Cells") %>% 
  extract2("EID") %>% 
  as.character()
primary_tcells
```

Now we can take our roadmap hub and query it based on our other conditions:

```{r}
methylation_files <- roadmap_hub %>% 
  query("BigWig") %>% 
  query(primary_tcells) %>% 
  query("H3K4ME[1-3]") %>% 
  query("pval.signal")

methylation_files
```

So we'll take the first two entries and download them as BigWigFiles:

```{r}
bw_files <- lapply(c("AH33454", "AH33455"), function(id) ah[[id]]) 
names(bw_files) <- c("HK34ME1", "HK34ME3")
```

We have our desired BigWig files so now we can we can start analysing them.

#### Reading BigWig files

For this analysis, we will call peaks from islands of scores
over chromosome 10.

First, we extract the genome information from the first BigWig file and filter
to get the range for chromosome 10. This range will be used as a filter when 
reading the file.

```{r load-bw}
chr10_ranges <- bw_files %>% 
  extract2(1L) %>% 
  get_genome_info() %>%
  filter(seqnames == "chr10")
```

Then we read the BigWig file only extracting scores if they overlap chromosome
10. We also add the genome build information to the resulting ranges. This
book-keeping is good practice as it ensures the integrity of any 
downstream operations such as finding overlaps.

```{r}
# a function to read our bigwig files
read_chr10_scores <- function(file) {
  read_bigwig(file, overlap_ranges = chr10_ranges) %>% 
    set_genome_info(genome = "hg19")
}
# apply the function to each file
chr10_scores <- lapply(bw_files, read_chr10_scores) 
# bind the ranges to a single GRAnges object
# and add a column to identify the ranges by signal type
chr10_scores <- bind_ranges(chr10_scores, .id = "signal_type")
chr10_scores
```
Since we want to call peaks over each signal type, we will create a grouped
GRanges object:

```{r}
chr10_scores_by_signal <- chr10_scores %>% 
    group_by(signal_type)
```

We can then filter to find the coordinates of the peak containing the maximum
score for each signal.
We can then find a 5000 nt region centred around the maximum position 
by anchoring and modifying the width. 

```{r max-score-region}
chr10_max_score_region <- chr10_scores_by_signal %>%
  filter(score == max(score)) %>% 
  ungroup() %>% 
  anchor_center() %>%
  mutate(width = 5000)
```

Finally, the overlap inner join is used to restrict the chromosome 10
coverage islands, to the islands that are contained in the 5000nt region that 
surrounds the max peak for each signal type.

```{r peak_region}
peak_region <- chr10_scores %>%
    join_overlap_inner(chr10_max_score_region) %>% 
    filter(signal_type.x == signal_type.y) 
# the filter ensures overlapping correct methlyation signal
```

#### Exercises

1. Use the `reduce_ranges()` function to find all peaks for each signal type.
2. How could you annotate the scores to find out which genes overlap
each peak found in 1.?
3. Plot a 1000nt window centred around the maximum scores for each signal type 
using the `ggbio` or `Gviz` package.

### Worked example: coverage analysis of BAM files

A common quality control check in a genomics workflow is to perform coverage
analysis over features of interest or over the entire genome (again
we see the bioinformatician's love of counting things). Here we use the 
airway package to compute coverage histograms and show how you can read BAM
files into memory as GRanges objects with plyranges.

First let's gather all the BAM files available to use in airway (see
`browseVignettes("airway")` for more information about the data and how it 
was prepared):

```{r airway_bams}
bfs <- system.file("extdata", package = "airway") %>% 
    dir(pattern = ".bam", 
        full.names = TRUE)
# get sample names (everything after the underscore)
names(bfs) <- bfs %>% 
    basename() %>% 
    sub("_[^_]+$", "", .)
```

To start let's look at a single BAM file, we can compute the coverage of 
the alignments over all contigs in the BAM as follows:

```{r}
first_bam_cvg <- bfs %>% 
    extract2(1) %>% 
    compute_coverage()
first_bam_cvg
```

Here the coverage is computed without the entire BAM file being read into
memory. Notice that the score here is the count of the number of alignments
that cover a given range. To compute a coverage histogram, that is the number
of number of bases that have a given coverage score for each contig we can
simply use `summarise()`:

```{r}
first_bam_cvg %>% 
    group_by(seqnames, score) %>% 
    summarise(n_bases_covered = sum(width))
```

For RNA-seq experiments we are often interested in splitting up alignments
based on whether the alignment has skipped a region from the reference (
that is, there is an "N" in the cigar string, indicating an intron). 

In plyranges this can be achieved with the `chop_by_introns()`, that will
split up an alignment if there's an intron. This results in a grouped ranges 
object.

To begin we can read the BAM file using `read_bam()`, this does not read the 
entire BAM into memory but instead waits for further functions to be applied 
to it.
```{r}
# no index present in directory so we use index = NULL
split_bam <- bfs %>% 
    extract2(1) %>% 
    read_bam(index = NULL) 
# nothing has been read
split_bam
```

For example we can select elements from the BAM file 
to be included in the resulting GRanges to load alignments into memory:

```{r}
split_bam %>% 
    select(flag)
```

Finally, we can split our alignments using the `chop_by_introns()`:

```{r}
split_bam <- split_bam %>% 
    chop_by_introns()
split_bam
# look at all the junction reads 
split_bam %>% 
    filter(n() >= 2)
```


#### Exercises 

1. Compute the total depth of coverage across all features.
1. How could you compute the proportion of bases covered over an entire genome?
(hint: `get_genome_info` and `S4Vectors::merge`)
1. How could you compute the strand specific genome wide coverage?
1. Create a workflow for computing the strand specific coverage for all BAM
files.
1. For each sample plot total breadth of coverage against the number of bases
covered faceted by each sample name.

## What's next?

The grammar based approach for analysing data provides a consistent approach
for analysing genomics experiments with Bioconductor. We have explored
how plyranges can enable you to perform common analysis tasks required of 
a bioinformatician. As plyranges is still a new and growing package, there is 
definite room for improvement (and likely bugs), if you have any problems using
it or think things could be easier or clearer please file an issue on 
[github](https://github.com/sa-lee/plyranges). Contributions from the Bioconductor
community are also more than welcome!


## Appendix {#appendix}

Category               Verb                            Description
--------------------- ------------------------------- --------------------------------------------------------------------
                        ***summarise()***               aggregate over column(s)
  Aggregation           *disjoin\_ranges()*             aggregate column(s) over the union of end coordinates
                        *reduce\_ranges()*              aggregate column(s) by merging overlapping and neighbouring ranges
                        ***mutate()***                  modifies any column
                        ***select()***                  select columns
  Arithmetic (Unary)    ***arrange()***                 sort by columns
                        *stretch()*                     extend range by fixed amount
                        *shift\_(direction)*            shift coordinates
                        *flank\_(direction)*            generate flanking regions
                        *%intersection%*                row-wise intersection
                        *%union%*                       row-wise union
                        *compute\_coverage*             coverage over all ranges
  Arithmetic (Binary)   *%setdiff%*                     row-wise set difference
                        *between()*                     row-wise gap range
                        *span()*                        row-wise spanning range
                        *join\_overlap\_()*             merge by overlapping ranges
                        *join\_nearest*                 merge by nearest neighbour ranges
                        *join\_follow*                  merge by following ranges
  Merging               *join\_precedes*                merge by preceding ranges
                        *union\_ranges*                 range-wise union
                        *intersect\_ranges*             range-wise intersect
                        *setdiff\_ranges*               range-wise set difference
                        *complement\_ranges*            range-wise union
                        *anchor\_direction()*           fix coordinates at direction
  Modifier              ***group\_by()***               partition by column(s)
                        *group\_by\_overlaps()*         partition by overlaps
                        ***filter()***                  subset rows
  Restriction           *filter\_by\_overlaps()*        subset by overlap
                        *filter\_by\_non\_overlaps()*   subset by no overlap

  : Overview of the plyranges grammar. The core verbs are briefly
  described and categorised into one of: aggregation, unary or binary
  arithmetic, merging, modifier, or restriction. A verb is given bold
  text if its origin is from the dplyr grammar.