-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
102 lines (71 loc) · 3.63 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "README"
author: "Kimberle Shen"
date: "9/28/2020"
output:
md_document:
variant: markdown_github
---
# rfca
This package contains functions that trains a Random Forest Model with a labelled Seurat Object, for predicting cell types/states in unlabelled datasets. It also contains a pre-trained Random Forest model, as well as example datasets.
# Introduction
Manual cell annotation of scRNAseq datasets, typically based on marker genes, can be time-consuming and biased. Being able to automatically predict cell types/states in a cell-by-cell and cluster-unbiased way is useful for fast and accurate phenotyping.
In addition, despite the increasing amounts of scRNAseq datasets being generated, thorough analysis of these datasets is lagging, and/or done in silos. This package comes with a preloaded Random Forest model based on different datasets and cell types/states, that will be constantly updated.
# Installation
You can install this package from GitHub with `devtools::install_github("kimberle9/rfca")`
# Manual
View the manual here: https://kimberle9.github.io/rfca/articles/vignette.html
# Examples
## Example 1: Creating a Seurat Object
```{r eval = FALSE}
# Load the rfca and Seurat libraries, as well as example datasets
library(rfca)
library(Seurat)
# Create Seurat Object with PCA and UMAP calculated
# Cell ranger raw data ("/filtered_feature_bc_matrix" directory) is required for this step
mySeuratObject <- createSeuratObjectPipeline(data.dir = "~/filtered_feature_bc_matrix",
nFeature_RNA_lower = 500,
nFeature_RNA_upper = 5000,
percent.mt = 5,
nfeatures = 2000,
dims = 20,
clusterResolution = 0.8)
# Assign cell type Idents to mySeuratObject manually, if you want to use it as a training dataset
```
## Example 2: Using Random Forest to train and predict cell types
```{r}
library(rfca)
library(Seurat)
data("exampleSeuratObjectUnlabelled")
data("exampleSeuratObjectLabelled")
# Create Random Forest Model with your labelled Seurat Object
myRandomForestModel <- createRFModel(exampleSeuratObjectLabelled)
# Create marker gene list from random forest model
markerGeneList <- createGeneLists(myRandomForestModel)
# Visualize Feature Plot based on marker gene list
myPlot <- cellMarkerPlots(exampleSeuratObjectLabelled, geneList = markerGeneList)
myPlot
# Predict cells based on your own Random Forest Model created above
autoLabelledSeuratObject <- predictCells(exampleSeuratObjectUnlabelled, myRandomForestModel)
# Predict cells based on my pre-loaded and pre-trained Random Forest Model
# With no model passed in, default model used is mouseBrain.
# See documentation for available tissue types.
autoLabelledSeuratObject <- predictCells(exampleSeuratObjectUnlabelled)
# Visualize autoLabelledSeuratObject
DimPlot(autoLabelledSeuratObject)
```
## Example 3: Defining microglia subtypes
```{r eval = FALSE}
library(rfca)
library(Seurat)
# Create Seurat object (microglia-only data)
microglia <- createSeuratObjectPipeline(data.dir = "~/filtered_feature_bc_matrix")
# Define microglia subtypes based on cluster numbers
DimPlot(microglia)
# Build Random Forest Model with cluster number as labels
microgliaModel <- createRFModel(microglia)
# Phenotype subtype proportion for different mouse models
mouseModel1 <- predictCells(mouseModel1, microgliaModel)
mouseModel2 <- predictCells(mouseModel2, microgliaModel)
mouseModel3 <- predictCells(mouseModel3, microgliaModel)
```