Skip to content

Commit

Permalink
Merge pull request #47 from databio/dev
Browse files Browse the repository at this point in the history
Major updates to pipeline use
  • Loading branch information
jpsmith5 authored Jan 28, 2020
2 parents 2e57c0a + 116a92c commit 04eb2b9
Show file tree
Hide file tree
Showing 94 changed files with 25,959 additions and 2,539 deletions.
25 changes: 25 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
# General
*.pyc
.~lock*

# Tests
.cache/
*peppro_test*

# MkDocs files
site/

# Jekyll files
jekyll/
_site
.DS_store
.jekyll
.bundle
.sass-cache
_site/
/_site/
.sass-cache/
.jekyll-metadata

# Annotation files
# ignore local annotation files
anno/hg19_annotations.bed.gz
anno/hg19_annotations.bed
Expand All @@ -7,3 +31,4 @@ anno/mm10_annotations.bed.gz
anno/mm10_annotations.bed
anno/mm9_annotations.bed.gz
anno/mm9_annotations.bed

55 changes: 55 additions & 0 deletions BiocProject/PEPPRO_BiocProject.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "PEPPRO BiocProject"
author: "Jason Smith"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{PEPPRO BiocProject}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---


```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# Introduction

Before you start see the [Getting started with `BiocProject` vignette](http://code.databio.org/BiocProject/articles/vignette1getStarted.html) for the basic `BiocProject` information and installation instructions and [`PEPPRO` website](http://peppro.databio.org) for information regarding this nascent RNA profiling pipeline.

`BiocProject` provides a straigtforward method to read in pipeline outputs as listed in the `outputs` section of its [pipeline interface](http://code.databio.org/looper/pipeline-interface/).

__With a single line of code you can read all the indicated results and your project metadata.__

# Read the results of `PEPPRO` run

The function shown below reads in the gene count `BED` files from the `output` section specified in the [`PEPPRO` pipeline interface](https://github.com/databio/peppro/blob/master/pipeline_interface.yaml).

The way the output files are read is defined in a [function](https://github.com/databio/peppro/blob/master/BiocProject/readPepproGeneCounts.R) supplied by the `PEPPRO` developers. The function listed in `bioconductor` section of `PEPPRO` pipeline interface file is identified by `BiocProject` function, sourced and automatically executed on samples matching the protocols bound to the pipeline specified as an argument in [`outputsByPipeline`](http://code.databio.org/BiocProject/reference/outputsByPipeline.html) function.

## Get the project config

```{r echo=T, message=FALSE}
library(BiocProject)
ProjectConfig = "peppro_da.yaml"
```

## Run the `BiocProject` function

```{r}
bp = BiocProject(ProjectConfig)
```

As you can see in the message above, the `readPepproGeneCounts` function was sourced from the file indicated in the `PEPPRO` pipeline interface.

## Browse the results

The read data is conveninetly stored in a `List` object with ([`pepr::Project`](http://code.databio.org/pepr/reference/Project-class.html) object in its metadata slot:

```{r}
bp
```
7 changes: 7 additions & 0 deletions BiocProject/peppro_da.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
sample_name,toggle,protocol,organism,read_type,umi_status,umi_length,data_source,read1,read2,srr,experiment,geo,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Cell_Line,Cell_type,Center Name,Consent,DATASTORE filetype,DATASTORE provider,DATASTORE region,Instrument,LibraryLayout,LibrarySelection,LibrarySource,Organism,Platform,ReleaseDate,Sample Name,source_name,SRA Study,treatment
H9_DMSO_rep1,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669536,SRX7348011,GSM4214080,OTHER,83,4066110724,PRJNA594951,SAMN13541464,1576539216,H9,embryonic stem cells,GEO,public,"fastq,sra","s3,ncbi,gs","s3.us-east-1,gs.US,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214080,H9 cells,SRP236879,control
H9_DMSO_rep2,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669537,SRX7348012,GSM4214081,OTHER,83,4824187397,PRJNA594951,SAMN13541463,1851708346,H9,embryonic stem cells,GEO,public,"sra,fastq","gs,s3,ncbi","gs.US,s3.us-east-1,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214081,H9 cells,SRP236879,control
H9_DMSO_rep3,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669538,SRX7348013,GSM4214082,OTHER,83,3857336412,PRJNA594951,SAMN13541462,1508923353,H9,embryonic stem cells,GEO,public,"sra,fastq","s3,ncbi,gs","gs.US,s3.us-east-1,ncbi.public",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214082,H9 cells,SRP236879,control
H9_200nM_romidepsin_rep1,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669539,SRX7348014,GSM4214083,OTHER,83,4636791999,PRJNA594951,SAMN13541471,1798846852,H9,embryonic stem cells,GEO,public,"fastq,sra","gs,ncbi,s3","gs.US,ncbi.public,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214083,H9 cells,SRP236879,60 minutes 200nM romidepsin
H9_200nM_romidepsin_rep2,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669540,SRX7348015,GSM4214084,OTHER,82,4730726832,PRJNA594951,SAMN13541470,1833437275,H9,embryonic stem cells,GEO,public,"fastq,sra","ncbi,gs,s3","gs.US,ncbi.public,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214084,H9 cells,SRP236879,60 minutes 200nM romidepsin
H9_200nM_romidepsin_rep3,1,PRO,human,PAIRED,true_8,8,SRA,PE1,PE2,SRR10669541,SRX7348016,GSM4214085,OTHER,83,5015008131,PRJNA594951,SAMN13541469,1922230177,H9,embryonic stem cells,GEO,public,"fastq,sra","s3,gs,ncbi","ncbi.public,gs.US,s3.us-east-1",NextSeq 500,PAIRED,other,TRANSCRIPTOMIC,Homo sapiens,ILLUMINA,2019-12-13T00:00:00Z,GSM4214085,H9 cells,SRP236879,60 minutes 200nM romidepsin
27 changes: 27 additions & 0 deletions BiocProject/peppro_da.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Run PEPPRO paper differential analysis samples through PEPPRO
name: PEPPRO

metadata:
sample_table: "peppro_da.csv"
output_dir: "$PROCESSED/peppro/paper/da"
pipeline_interfaces: "$CODE/peppro/pipeline_interface.yaml"
bioconductor:
readFunName: readPepproGeneCounts
readFunPath: readPepproGeneCounts.R

derived_columns: [read1, read2]

data_sources:
PE1: "${SRAFQ}/{srr}_1.fastq.gz"
PE2: "${SRAFQ}/{srr}_2.fastq.gz"

implied_columns:
organism:
human:
genome: hg38
prealignments: human_rDNA
max_len: -1
umi_status:
true_8:
umi_len: 8

31 changes: 31 additions & 0 deletions BiocProject/readPepproGeneCounts.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
readPepproGeneCounts = function(project) {
cwd <- getwd()
project_dir <- pepr::config(project)$metadata$output_dir
sample_names <- pepr::samples(project)$sample_name
genomes <- as.list(pepr::samples(project)$genome)
names(genomes) <- sample_names
paths <- vector("list", length(sample_names))
names(paths) <- sample_names

for (sample in sample_names) {
paths[[sample]] <- paste(project_dir, 'results_pipeline', sample,
paste0('signal_', genomes[[sample]]),
paste0(sample, "_gene_coverage.bed"), sep="/")
}

result <- lapply(paths, function(x){
#message(paste0("x: ", x))
if (file.exists(x)) {
df <- fread(x)
colnames(df) <- c('chr', 'start', 'end', 'geneName',
'score', 'strand', 'count')
gr <- GenomicRanges::GRanges(df)
} else {
gr <- GenomicRanges::GRanges()
}
})

setwd(cwd)
#names(result) <- sample_names
return(GenomicRanges::GRangesList(Filter(length, result)))
}
2 changes: 1 addition & 1 deletion PEPPROr/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: PEPPROr
Title: Functions and libraries to analyze pro-seq (or gro-seq) data
Version: 0.0.1.0000
Version: 0.0.2.0000
Authors@R: person("Jason", "Smith", email = "jasonsmith@virginia.edu", role = c("aut", "cre"))
Maintainer: Jason Smith <jasonsmith@virginia.edu>
Description: Installs required libraries to calculate the fraction of reads in features, to plot library complexity curves, TSS enrichments, and fragment length distributions.
Expand Down
1 change: 1 addition & 0 deletions PEPPROr/NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(plotCutadapt)
export(plotAdapt)
export(plotPI)
export(mRNAcontamination)
export(plotFRiF)
Expand Down
Loading

0 comments on commit 04eb2b9

Please sign in to comment.