README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(genogamesh)
```

# genogamesh

<!-- badges: start -->
<!-- badges: end -->


## installation

You can install the development version of `genogamesh` like so:

``` r
devtools::install_github("william-swl/genogamesh")
```

## parse bioinfomatic data

- parse the output of SingleR

```{r parse_bio-parse_SingleR}
# SingleR(test, ref) %>% parse_SingleR()
```

-  parse somatic hypermutation from igblast output

```{r parse_bio-parse_IgBlast_shm}
# parse_IgBlast_shm('igblast_out.txt')
```

- parse sequences from CellRanger vdj output
```{r parse_bio-parse_CellRanger_vdjseq}
# parse_CellRanger_vdjseq(df)
# parse_CellRanger_vdjseq(df, file='seq.csv')
# parse_CellRanger_vdjseq(df, file='seq.fa', fa_content='seq_orf_nt')
```

- parse sequences from ANARCI vdj output
```{r parse_bio-parse_ANARCI_aaseq}
# parse_ANARCI_aaseq(df, chain='H')
# parse_ANARCI_aaseq(df, chain='L')

# keep the ab numbering
# parse_ANARCI_aaseq(df, chain='H', keep_number=TRUE)
```

- parse vcf file with the help of reference genome and annotations. It is still 
under development, which can not process more than 3 nt substitutions in 
a single record row of vcf file, and can not process indels

```{r parse_bio-parse_vcf}
# vcf <- read_vcf(...)
# fa <- read_fasta(..)
# gff <- read_gff(...)
# parse_vcf(vcf, fa, gff)
```


## shortcuts for bioinfomatic pipelines

- add SingleR celltype annotation for Seurat object
```{r shortcut_bio-SingleR_SE}
# SE <- SingleR_SE(SE, SEref)
```

- reduction from raw Seurat object created by read count matrix, including
normalization, variable features calling, scaling, PCA and UMAP
```{r shortcut_bio-reduction_SE}
# SE <- reduction_SE(SE)
```

- translate nucleotides into amino acids from the first character

```{r shortcut_bio-nt2aa}
nt2aa(c("ATGAAA", "TTGCCC", "CTGTTT"))
```

- build antigen map from sera titer data

```{r shortcut_bio-antigen_map}
# antigen_map(data, sera_meta, ag_meta, seed=14)
```

## IO

```
read_fasta()

write_fasta()

read_vcf()

read_gff()
```


## S4 classes in `genogamesh`
### mutstr
- a S4 class to manipulate mutation strings
- support set operations

``` {r mutstr}
raw_mut_string <- c(
  variant1 = "T10I,D20N,Q30E,A40T,P50L,G60R",
  variant2 = "T10I,D20-,Q30E,A40T,P50L,G60R,S80R",
  variant3 = "T10A,D20G,Q30E,A40T,P50L,G60R"
)

m <- mutstr(raw_mut_string, sep = ",")

m

names(m)

mstr(m)

mut(m)

m[1:2]

m[[2]]

intersect(m, m[1])

setdiff(m, m[1])

union(m, m[1])
```