README.Rmd

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# tidysq <a href='https://biogenies.info/tidysq/'><img src='man/figures/logo.png' align="right" height="139" /></a>

<!-- badges: start -->
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidysq)](https://cran.r-project.org/package=tidysq)
  [![Github Actions Build Status](https://github.com/BioGenies/tidysq/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/BioGenies/tidysq/actions)
  [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

## Overview

`tidysq` contains tools for analysis and manipulation of biological sequences (including amino acid and nucleic acid -- e.g. RNA, DNA -- sequences). Two major features of this package are:

- effective compression of sequence data, allowing to fit larger datasets in **R**,

- compatibility with most of `tidyverse` universe, especially `dplyr` and `vctrs` packages, making analyses *tidier*.

## Getting started

[Try our quick start vignette](http://biogenies.info/tidysq/articles/quick-start.html) or [our exhaustive documentation](http://biogenies.info/tidysq/reference/index.html).

## Installation

The easiest way to install `tidysq` package is to download its latest version from CRAN repository:

```{r, eval=FALSE}
install.packages("tidysq")
```

Alternatively, it is possible to download the development version directly from GitHub repository:

```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("BioGenies/tidysq")
```

## Example usage

```{r, message=FALSE}
library(tidysq)
```

```{r}
file <- system.file("examples", "example_aa.fasta", package = "tidysq")
sqibble <- read_fasta(file)
sqibble

sq_ami <- sqibble$sq
sq_ami

# Subsequences can be extracted with bite()
bite(sq_ami, 5:10)

# There are also more traditional functions
reverse(sq_ami)

# find_motifs() returns a whole tibble of useful informations
find_motifs(sqibble, "^VHX")
```

An example of `dplyr` integration:

```{r, message=FALSE}
library(dplyr)
# tidysq integrates well with dplyr verbs
sqibble %>%
  filter(sq %has% "VFF") %>%
  mutate(length = get_sq_lengths(sq))
```

## Citation

For citation type:

```{r, eval=FALSE}
citation("tidysq")
```

or use:

Michal Burdukiewicz, Dominik Rafacz, Laura Bakala, Jadwiga Slowik, Weronika Puchala, Filip Pietluch, Katarzyna Sidorczuk, Stefan Roediger and Leon Eyrich Jessen (2021). tidysq: Tidy Processing and Analysis of Biological Sequences. R package version 1.1.3.