-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
143 lines (98 loc) · 2.63 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(genogamesh)
```
# genogamesh
<!-- badges: start -->
<!-- badges: end -->
## installation
You can install the development version of `genogamesh` like so:
``` r
devtools::install_github("william-swl/genogamesh")
```
## parse bioinfomatic data
- parse the output of SingleR
```{r parse_bio-parse_SingleR}
# SingleR(test, ref) %>% parse_SingleR()
```
- parse somatic hypermutation from igblast output
```{r parse_bio-parse_IgBlast_shm}
# parse_IgBlast_shm('igblast_out.txt')
```
- parse sequences from CellRanger vdj output
```{r parse_bio-parse_CellRanger_vdjseq}
# parse_CellRanger_vdjseq(df)
# parse_CellRanger_vdjseq(df, file='seq.csv')
# parse_CellRanger_vdjseq(df, file='seq.fa', fa_content='seq_orf_nt')
```
- parse sequences from ANARCI vdj output
```{r parse_bio-parse_ANARCI_aaseq}
# parse_ANARCI_aaseq(df, chain='H')
# parse_ANARCI_aaseq(df, chain='L')
# keep the ab numbering
# parse_ANARCI_aaseq(df, chain='H', keep_number=TRUE)
```
- parse vcf file with the help of reference genome and annotations. It is still
under development, which can not process more than 3 nt substitutions in
a single record row of vcf file, and can not process indels
```{r parse_bio-parse_vcf}
# vcf <- read_vcf(...)
# fa <- read_fasta(..)
# gff <- read_gff(...)
# parse_vcf(vcf, fa, gff)
```
## shortcuts for bioinfomatic pipelines
- add SingleR celltype annotation for Seurat object
```{r shortcut_bio-SingleR_SE}
# SE <- SingleR_SE(SE, SEref)
```
- reduction from raw Seurat object created by read count matrix, including
normalization, variable features calling, scaling, PCA and UMAP
```{r shortcut_bio-reduction_SE}
# SE <- reduction_SE(SE)
```
- translate nucleotides into amino acids from the first character
```{r shortcut_bio-nt2aa}
nt2aa(c("ATGAAA", "TTGCCC", "CTGTTT"))
```
- build antigen map from sera titer data
```{r shortcut_bio-antigen_map}
# antigen_map(data, sera_meta, ag_meta, seed=14)
```
## IO
```
read_fasta()
write_fasta()
read_vcf()
read_gff()
```
## S4 classes in `genogamesh`
### mutstr
- a S4 class to manipulate mutation strings
- support set operations
``` {r mutstr}
raw_mut_string <- c(
variant1 = "T10I,D20N,Q30E,A40T,P50L,G60R",
variant2 = "T10I,D20-,Q30E,A40T,P50L,G60R,S80R",
variant3 = "T10A,D20G,Q30E,A40T,P50L,G60R"
)
m <- mutstr(raw_mut_string, sep = ",")
m
names(m)
mstr(m)
mut(m)
m[1:2]
m[[2]]
intersect(m, m[1])
setdiff(m, m[1])
union(m, m[1])
```