-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #47 from databio/dev
Major updates to pipeline use
- Loading branch information
Showing
94 changed files
with
25,959 additions
and
2,539 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: "PEPPRO BiocProject" | ||
author: "Jason Smith" | ||
date: "`r Sys.Date()`" | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{PEPPRO BiocProject} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
|
||
```{r setup, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>" | ||
) | ||
``` | ||
|
||
# Introduction | ||
|
||
Before you start see the [Getting started with `BiocProject` vignette](http://code.databio.org/BiocProject/articles/vignette1getStarted.html) for the basic `BiocProject` information and installation instructions and [`PEPPRO` website](http://peppro.databio.org) for information regarding this nascent RNA profiling pipeline. | ||
|
||
`BiocProject` provides a straigtforward method to read in pipeline outputs as listed in the `outputs` section of its [pipeline interface](http://code.databio.org/looper/pipeline-interface/). | ||
|
||
__With a single line of code you can read all the indicated results and your project metadata.__ | ||
|
||
# Read the results of `PEPPRO` run | ||
|
||
The function shown below reads in the gene count `BED` files from the `output` section specified in the [`PEPPRO` pipeline interface](https://github.com/databio/peppro/blob/master/pipeline_interface.yaml). | ||
|
||
The way the output files are read is defined in a [function](https://github.com/databio/peppro/blob/master/BiocProject/readPepproGeneCounts.R) supplied by the `PEPPRO` developers. The function listed in `bioconductor` section of `PEPPRO` pipeline interface file is identified by `BiocProject` function, sourced and automatically executed on samples matching the protocols bound to the pipeline specified as an argument in [`outputsByPipeline`](http://code.databio.org/BiocProject/reference/outputsByPipeline.html) function. | ||
|
||
## Get the project config | ||
|
||
```{r echo=T, message=FALSE} | ||
library(BiocProject) | ||
ProjectConfig = "peppro_da.yaml" | ||
``` | ||
|
||
## Run the `BiocProject` function | ||
|
||
```{r} | ||
bp = BiocProject(ProjectConfig) | ||
``` | ||
|
||
As you can see in the message above, the `readPepproGeneCounts` function was sourced from the file indicated in the `PEPPRO` pipeline interface. | ||
|
||
## Browse the results | ||
|
||
The read data is conveninetly stored in a `List` object with ([`pepr::Project`](http://code.databio.org/pepr/reference/Project-class.html) object in its metadata slot: | ||
|
||
```{r} | ||
bp | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
sample_name,toggle,protocol,organism,read_type,umi_status,umi_length,data_source,read1,read2,srr,experiment,geo,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Cell_Line,Cell_type,Center Name,Consent,DATASTORE filetype,DATASTORE provider,DATASTORE region,Instrument,LibraryLayout,LibrarySelection,LibrarySource,Organism,Platform,ReleaseDate,Sample Name,source_name,SRA Study,treatment | ||
H9_DMSO_rep1,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669536,SRX7348011,GSM4214080,OTHER,83,4066110724,PRJNA594951,SAMN13541464,1576539216,H9,embryonic stem cells,GEO,public,"fastq,sra","s3,ncbi,gs","s3.us-east-1,gs.US,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214080,H9 cells,SRP236879,control | ||
H9_DMSO_rep2,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669537,SRX7348012,GSM4214081,OTHER,83,4824187397,PRJNA594951,SAMN13541463,1851708346,H9,embryonic stem cells,GEO,public,"sra,fastq","gs,s3,ncbi","gs.US,s3.us-east-1,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214081,H9 cells,SRP236879,control | ||
H9_DMSO_rep3,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669538,SRX7348013,GSM4214082,OTHER,83,3857336412,PRJNA594951,SAMN13541462,1508923353,H9,embryonic stem cells,GEO,public,"sra,fastq","s3,ncbi,gs","gs.US,s3.us-east-1,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214082,H9 cells,SRP236879,control | ||
H9_200nM_romidepsin_rep1,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669539,SRX7348014,GSM4214083,OTHER,83,4636791999,PRJNA594951,SAMN13541471,1798846852,H9,embryonic stem cells,GEO,public,"fastq,sra","gs,ncbi,s3","gs.US,ncbi.public,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214083,H9 cells,SRP236879,60 minutes 200nM romidepsin | ||
H9_200nM_romidepsin_rep2,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669540,SRX7348015,GSM4214084,OTHER,82,4730726832,PRJNA594951,SAMN13541470,1833437275,H9,embryonic stem cells,GEO,public,"fastq,sra","ncbi,gs,s3","gs.US,ncbi.public,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214084,H9 cells,SRP236879,60 minutes 200nM romidepsin | ||
H9_200nM_romidepsin_rep3,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669541,SRX7348016,GSM4214085,OTHER,83,5015008131,PRJNA594951,SAMN13541469,1922230177,H9,embryonic stem cells,GEO,public,"fastq,sra","s3,gs,ncbi","ncbi.public,gs.US,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214085,H9 cells,SRP236879,60 minutes 200nM romidepsin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Run PEPPRO paper differential analysis samples through PEPPRO | ||
name: PEPPRO | ||
|
||
metadata: | ||
sample_table: "peppro_da.csv" | ||
output_dir: "$PROCESSED/peppro/paper/da" | ||
pipeline_interfaces: "$CODE/peppro/pipeline_interface.yaml" | ||
bioconductor: | ||
readFunName: readPepproGeneCounts | ||
readFunPath: readPepproGeneCounts.R | ||
|
||
derived_columns: [read1, read2] | ||
|
||
data_sources: | ||
PE1: "${SRAFQ}/{srr}_1.fastq.gz" | ||
PE2: "${SRAFQ}/{srr}_2.fastq.gz" | ||
|
||
implied_columns: | ||
organism: | ||
human: | ||
genome: hg38 | ||
prealignments: human_rDNA | ||
max_len: -1 | ||
umi_status: | ||
true_8: | ||
umi_len: 8 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
readPepproGeneCounts = function(project) { | ||
cwd <- getwd() | ||
project_dir <- pepr::config(project)$metadata$output_dir | ||
sample_names <- pepr::samples(project)$sample_name | ||
genomes <- as.list(pepr::samples(project)$genome) | ||
names(genomes) <- sample_names | ||
paths <- vector("list", length(sample_names)) | ||
names(paths) <- sample_names | ||
|
||
for (sample in sample_names) { | ||
paths[[sample]] <- paste(project_dir, 'results_pipeline', sample, | ||
paste0('signal_', genomes[[sample]]), | ||
paste0(sample, "_gene_coverage.bed"), sep="/") | ||
} | ||
|
||
result <- lapply(paths, function(x){ | ||
#message(paste0("x: ", x)) | ||
if (file.exists(x)) { | ||
df <- fread(x) | ||
colnames(df) <- c('chr', 'start', 'end', 'geneName', | ||
'score', 'strand', 'count') | ||
gr <- GenomicRanges::GRanges(df) | ||
} else { | ||
gr <- GenomicRanges::GRanges() | ||
} | ||
}) | ||
|
||
setwd(cwd) | ||
#names(result) <- sample_names | ||
return(GenomicRanges::GRangesList(Filter(length, result))) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.