Skip to content

Commit cb69ad6

Browse files
committed
Merge branch 'dev'
2 parents ea2d1e3 + f0dc32e commit cb69ad6

12 files changed

+55
-87
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Installation
3737
------------
3838

3939
- Note: It is not necessary to install scATAC-pro from scratch. You can use the docker or singularity version if your system support (see [Run scATAC-pro through docker or singularity](#run-scATAC-pro-through-docker-or-singularity) )
40-
- Run the following command in your terminal, scATAC-pro will be installed in YOUR\_INSTALL\_PATH/scATAC-pro\_1.4.3
40+
- Run the following command in your terminal, scATAC-pro will be installed in YOUR\_INSTALL\_PATH/scATAC-pro\_1.4.4
4141

4242
<!-- -->
4343

@@ -49,9 +49,9 @@ Installation
4949
Updates
5050
------------
5151
- Now provide [scATAC-pro tutorial in R](https://scatacpro-in-r.netlify.app/index.html) for access QC metrics and perform downstream analysis
52-
- Current version: 1.4.3
52+
- Current version: 1.4.4
5353
- Highlighted updates
54-
* **New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.3)**
54+
* **New module *reprocess_cellranger_output* added, to reprocess 10x scATAC-seq data (including atac in 10x multiome assay) originally processed by cellranger, taking cellranger processed .bam and .fragments.tsv.gz files as input (v1.4.4)**
5555
* More friendly to single-end sequencing data (v1.4.2)
5656
* New module *labelTransfer* added, to do label trasfer (for cell annotation) from cell annotation of scRNA-seq data. First construct a gene by cell activity matrix, then use *FindTransferAnchors* and *TransferData* function from Seurat R package to predicted cell type annotation from the cell annotaiton in scRNA-seq data (v1.4.0)
5757
* New module *rmDoublets* added,to remove potential doublets using [DoubletFinder](https://github.com/chris-mcginnis-ucsf/DoubletFinder) algorithm (v1.3.1)
@@ -302,7 +302,7 @@ See [here](https://scatacpro-in-r.netlify.app/note_module) or in your terminal:
302302
usage : scATAC-pro -s STEP -i INPUT -c CONFIG [-o] [-h] [-v]
303303
Use option -h|--help for more information
304304

305-
scATAC-pro 1.4.3
305+
scATAC-pro 1.4.4
306306
---------------
307307
OPTIONS
308308

complete_update_history.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
## Complete Update History
2+
- Version 1.4.4 released
3+
* Only consider standard chromosomes in the *qc_per_barcode* module
4+
* Correct a minor bug in the *qc_per_barcode* module
5+
* Add version# in the html report
6+
* Clean and correct a minor bug in the *trimming* module
27
- Version 1.4.3 released
38
* add new module *reprocess_cellranger_output* to reprocess scATAC-seq data originally processed by cellranger
49
- Version 1.4.2 released

scATAC-pro

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
#########################
1010

1111
SOFT="scATAC-pro"
12-
VERSION="1.4.3"
12+
VERSION="1.4.4"
1313

1414
function usage {
1515
echo -e "usage : $SOFT -s STEP -i INPUT -c CONFIG [-o] [-h] [-v] [-b]"

scripts/bam2qc.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ ${SAMTOOLS_PATH}/samtools index -@ $ncore ${mapRes_dir}/${OUTPUT_PREFIX}.positio
6363

6464
if [ $MAPQ -ne 30 ]; then
6565
${SAMTOOLS_PATH}/samtools view -f $flag0 -b -h -q $MAPQ -@ $ncore $position_sort_bam -o ${mapRes_dir}/${OUTPUT_PREFIX}.positionsort.MAPQ${MAPQ}.bam
66+
${SAMTOOLS_PATH}/samtools index -@ $ncore ${mapRes_dir}/${OUTPUT_PREFIX}.positionsort.MAPQ${MAPQ}.bam
6667
fi
6768

6869
## mapping stats

scripts/mapping.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ ${R_PATH}/Rscript --vanilla ${curr_dir}/src/sort_frags.R ${qc_dir}/${OUTPUT_PREF
9393
## index fragment file
9494
#sort -k1,1 -k2,2n -T ${mapRes_dir}/tmp/ ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv > ${qc_dir}/${OUTPUT_PREFIX}.fragments.sorted.tsv
9595
#mv ${qc_dir}/${OUTPUT_PREFIX}.fragments.sorted.tsv ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv
96-
${TABIX_PATH}/bgzip ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv
97-
${TABIX_PATH}/tabix -p bed ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv.gz
96+
${TABIX_PATH}/bgzip -f ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv
97+
${TABIX_PATH}/tabix -f -p bed ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv.gz
9898

9999

scripts/process_with_bam.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ${R_PATH}/Rscript --vanilla ${curr_dir}/src/sort_frags.R ${qc_dir}/${OUTPUT_PREF
2929
#sort -k1,1 -k2,2n -T ${mapRes_dir}/tmp/ ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv > ${qc_dir}/${OUTPUT_PREFIX}.fragments.sorted.tsv
3030
#mv ${qc_dir}/${OUTPUT_PREFIX}.fragments.sorted.tsv ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv
3131
${TABIX_PATH}/bgzip -f ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv
32-
${TABIX_PATH}/tabix -p bed ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv.gz
32+
${TABIX_PATH}/tabix -f -p bed ${qc_dir}/${OUTPUT_PREFIX}.fragments.tsv.gz
3333

3434

3535
## 2.call peak

scripts/report.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,8 @@ configure_file=`basename ${2}`
2323
abs_configure_dir=`cd ${configure_dir}; pwd`
2424
abs_configure_file=${abs_configure_dir}/${configure_file}
2525

26-
#${R_PATH}/Rscript --vanilla ${curr_dir}/src/render2report.R \
27-
# ${abs_report_dir}/scATAC-pro_report_${OUTPUT_PREFIX}.html $abs_out_dir ${work_dir}/${2}
26+
scatacpro_version=`scATAC-pro --version | cut -d " " -f3`
2827

2928
${R_PATH}/Rscript --vanilla ${curr_dir}/src/render2report.R \
30-
${abs_report_dir}/scATAC-pro_report_${OUTPUT_PREFIX}.html $abs_out_dir $abs_configure_file
29+
${abs_report_dir}/scATAC-pro_report_${OUTPUT_PREFIX}.html $abs_out_dir $abs_configure_file $scatacpro_version $OUTPUT_PREFIX
3130
echo "Report generation Done!"

scripts/src/get_qc_per_barcode.R

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
library(data.table)
44
library(Rcpp)
55
library(Matrix)
6-
6+
library(GenomicRanges)
77

88

99
#sourceCpp(paste0('getOverlaps.cpp'))
@@ -122,11 +122,14 @@ frags = fread(frags.file, select=1:4, header = F)
122122
names(frags) = c('chr', 'start', 'end', 'bc')
123123
setkey(frags, chr, start)
124124

125+
## only keep reads in standard chrs
126+
chrs = standardChromosomes(makeGRangesFromDataFrame(frags))
127+
frags = frags[chr %in% chrs]
125128
frags[, 'total_frags' := .N, by = bc]
126129
frags = frags[total_frags > 5]
127130

128-
frags = frags[!grepl(chr, pattern = 'random', ignore.case = T)]
129-
frags = frags[!grepl(chr, pattern ='un', ignore.case = T)]
131+
#frags = frags[!grepl(chr, pattern = 'random', ignore.case = T)]
132+
#frags = frags[!grepl(chr, pattern ='un', ignore.case = T)]
130133

131134
peaks = fread(peaks.file, select=1:3, header = F)
132135
tss = fread(tss.file, select=1:3, header = F)
@@ -147,7 +150,7 @@ if(file.exists(enhs.file)) {
147150
setkey(peaks, chr, start)
148151
setkey(tss, chr, start)
149152

150-
chrs = unique(frags$chr)
153+
#chrs = unique(frags$chr)
151154

152155
## calculate tss enrichment score
153156
if(T){

scripts/src/labelTransfer.R

Lines changed: 2 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ gene_gtf_file = args[4]
1515
## if gtf file not provided, using R bioconductor packages for gene annotation
1616
if(!file.exists(gene_gtf_file)){
1717
if(!grepl(GENOME_NAME, pattern = 'hg19|hg38|mm10|mm9', ignore.case = T)){
18-
stop('Genome is not belong to any of hg19,hg38,mm9 or mm10,
18+
stop('Genome does not belong to any of hg19,hg38,mm9 or mm10,
1919
please provide .gtf file for gene annotation!')
2020
}
2121
if(grepl(GENOME_NAME, pattern = 'mm10', ignore.case = T)) {
@@ -61,64 +61,10 @@ if(!file.exists(gene_gtf_file)){
6161
gene_ann[, 'gene_name' := unlist(strsplit(gene_name, ' '))[3], by = gene_name]
6262
names(gene_ann)[1] = 'chr'
6363
gene_ann = subset(gene_ann, select = c(chr, V4, V5, V7, gene_name))
64-
chrs = 1:22
65-
chrs = c(chrs, 'X', 'Y', 'M')
64+
chrs = standardChromosomes(makeGRangesFromDataFrame(gene_ann))
6665
gene_ann = gene_ann[chr %in% chrs]
6766
gene_ann = gene_ann[!duplicated(gene_name)]
6867
names(gene_ann)[2:4] = c('gene_start', 'gene_end', 'strand')
69-
gene_ann[, 'chr' := paste0('chr', chr)]
70-
71-
}
72-
73-
74-
if(F){
75-
## download gtf file if not provided
76-
if(!file.exists(gene_gtf_file)){
77-
print('gene annotation gtf file not provided, I will try to download one:')
78-
err = 0
79-
if(grepl(GENOME_NAME, pattern = 'mm9', ignore.case = T)) {
80-
err <- tryCatch(download.file('ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz', temp),
81-
error = function(e) {
82-
print("Cannot download ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz!")
83-
return(1)})
84-
}
85-
if(grepl(GENOME_NAME, pattern = 'mm10', ignore.case = T)) {
86-
err <- tryCatch(download.file('ftp://ftp.ensembl.org/pub/release-95/gtf/mus_musculus/Mus_musculus.GRCm38.95.gtf.gz', temp),
87-
error = function(e) {
88-
print("Cannot download ftp://ftp.ensembl.org/pub/release-95/gtf/mus_musculus/Mus_musculus.GRCm38.95.gtf.gz!")
89-
return(1)})
90-
}
91-
if(grepl(GENOME_NAME, pattern = 'hg38', ignore.case = T)) {
92-
err <- tryCatch(download.file('ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz', temp),
93-
error = function(e) {
94-
print("Cannot download ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/Homo_sapiens.GRCh38.95.gtf.gz!")
95-
return(1)})
96-
}
97-
98-
if(grepl(GENOME_NAME, pattern = 'hg19', ignore.case = T)) {
99-
err <- tryCatch(download.file('ftp://ftp.ensembl.org/pub/release-67/gtf/homo_sapiens/Homo_sapiens.GRCh37.67.gtf.gz', temp),
100-
error = function(e) {
101-
print("Cannot download ftp://ftp.ensembl.org/pub/release-67/gtf/homo_sapiens/Homo_sapiens.GRCh37.67.gtf.gz!")
102-
return(1)})
103-
}
104-
105-
if(err == 1) stop('Download failed! Please provide a gtf file to run this module!')
106-
}
107-
108-
gene_ann = fread(gene_gtf_file, sep = '\t')
109-
gene_ann = gene_ann[V3 == 'gene']
110-
gene_ann[, 'gene_name' := unlist(strsplit(V9, ';'))[3], by = V9]
111-
gene_ann[, 'gene_name' := gsub("\"", "", gene_name), by = gene_name]
112-
gene_ann[, 'gene_name' := unlist(strsplit(gene_name, ' '))[3], by = gene_name]
113-
names(gene_ann)[1] = 'chr'
114-
gene_ann = subset(gene_ann, select = c(chr, V4, V5, V7, gene_name))
115-
chrs = 1:22
116-
chrs = c(chrs, 'X', 'Y')
117-
gene_ann = gene_ann[chr %in% chrs]
118-
gene_ann = gene_ann[!duplicated(gene_name)]
119-
names(gene_ann)[2:4] = c('gene_start', 'gene_end', 'strand')
120-
gene_ann[, 'chr' := paste0('chr', chr)]
121-
12268
}
12369

12470
seurat.rna = readRDS(inputSeurat_rna)

scripts/src/render2report.R

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,13 @@ args = commandArgs(T)
44
output_file = args[1]
55
result_dir = args[2]
66
configure_file = args[3]
7+
scatacpro_version = args[4]
8+
sample_name = args[5]
79

810
argv <- commandArgs(trailingOnly = FALSE)
911
curr_dir <- dirname(substring(argv[grep("--file=", argv)], 8))
1012
out_dir = dirname(output_file)
1113
rmarkdown::render(paste0(curr_dir, "/scATAC-pro_report.Rmd"), output_file=output_file,
1214
intermediates_dir = out_dir,
13-
params = list(output_dir = result_dir, configure_user = configure_file))
15+
params = list(set_title = scatacpro_version, set_sample = sample_name,
16+
output_dir = result_dir, configure_user = configure_file))

scripts/src/scATAC-pro_report.Rmd

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
---
2-
title: scATAC-pro Report
32
output:
43
flexdashboard::flex_dashboard:
54
vertical_layout: fill
65
social: menu
76
theme: united
87
params:
8+
set_title: scATAC-pro Report
9+
set_sample: PMBC10K
910
output_dir: /mnt/isilon/tan_lab/yuw1/run_scATAC-pro/PBMC10k/output
1011
configure_user: /mnt/isilon/tan_lab/yuw1/run_scATAC-pro/PBMC10k/configure_user.txt
11-
12+
title: "scATAC-pro `r params$set_title`: `r params$set_sample`"
1213
---
1314

1415
<style type="text/css">
@@ -102,7 +103,7 @@ mapping_qc$frac = paste0(100*mapping_qc$frac, '%')
102103

103104
mapping_qc = rbind(mapping_qc, data.frame(V1 ='Library Complexity (#unique fragments/#fragments)', V2 = '', frac = lib_complx))
104105

105-
kable(mapping_qc, col.names = NULL, format = 'html', caption = paste('Sample:', OUTPUT_PREFIX)) %>%
106+
kable(mapping_qc, col.names = NULL, format = 'html') %>%
106107
kable_styling("striped", full_width = F, position = 'left', font_size = 15)
107108

108109
write.table(mapping_qc, file = paste0(params$output_dir, '/summary/Tables/Global_Mapping_Statistics.tsv'),

scripts/trimming.sh

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ input_fastqs=$1
1111

1212
output_dir=${OUTPUT_DIR}/trimmed_fastq
1313
mkdir -p $output_dir
14+
mkdir -p ${OUTPUT_DIR}/demplxed_fastq
1415

1516
fastqs=(${input_fastqs//,/ }) ## suppose the first fastq is the read file, the others are index fastq files
1617
nfile=${#fastqs[@]}
@@ -19,7 +20,12 @@ kk=$(( $nfile ))
1920
isSingleEnd=$(echo $isSingleEnd | tr a-z A-Z)
2021

2122
if [[ "$isSingleEnd" = "TRUE" ]]; then
22-
prefix0=$(basename ${fastqs[0]})
23+
## make the output name consistent
24+
dex_fastq1=${OUTPUT_DIR}/demplxed_fastq/${OUTPUT_PREFIX}.demplxed.PE1.fastq.gz
25+
if [[ ! -f ${dex_fastq1} ]]; then
26+
ln -s ${fastqs[0]} $dex_fastq1
27+
fi
28+
prefix0=$(basename $dex_fastq1)
2329
if [ "$TRIM_METHOD" = 'Trimmomatic' ]; then
2430
echo "Using Trimmomatic ..."
2531
if [ -z $TRIMMOMATIC_PATH ]; then
@@ -34,7 +40,7 @@ if [[ "$isSingleEnd" = "TRUE" ]]; then
3440
exit
3541
fi
3642

37-
java -jar ${TRIMMOMATIC_PATH}/*jar SE -phred33 ${fastqs[0]} ${output_dir}/trimmed_${prefix0} \
43+
java -jar ${TRIMMOMATIC_PATH}/*jar SE -phred33 $dex_fastq1 ${output_dir}/trimmed_${prefix0} \
3844
ILLUMINACLIP:${ADAPTER_SEQ}:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25
3945

4046
mv $trimmed_fastq1 ${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.trimmed.demplxed.PE1.fastq.gz
@@ -43,23 +49,29 @@ if [[ "$isSingleEnd" = "TRUE" ]]; then
4349
echo "Using trim_galore ..."
4450
unset PYTHONHOME
4551
unset PYTHONPATH
46-
#dfastq1_pre=`echo $prefix0 | awk -F. '{print $1}'`
4752
trimmed_fastq1=${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.demplxed.PE1_val_1.fq.gz
4853
if [ -f "$trimmed_fastq1" ]; then
4954
echo -e "Trimmed fastq file $trimmed_fastq1 exist, I will skip trimming
5055
reads!"
5156
exit
5257
fi
53-
${TRIM_GALORE_PATH}/trim_galore -j 4 -o $output_dir ${fastqs[0]} --gzip --path_to_cutadapt ${CUTADAPT_PATH}/cutadapt
58+
${TRIM_GALORE_PATH}/trim_galore -j 4 -o $output_dir $dex_fastq1 --gzip --path_to_cutadapt ${CUTADAPT_PATH}/cutadapt
5459

5560
mv $trimmed_fastq1 ${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.trimmed.demplxed.PE1.fastq.gz
5661
echo "Trimming Done!"
5762
else
5863
echo "You have not specify TRIM_METHOD, so I do not trim the reads"
5964
fi
6065
else
61-
prefix0=$(basename ${fastqs[0]})
62-
prefix1=$(basename ${fastqs[1]})
66+
## make the output name consistent
67+
dex_fastq1=${OUTPUT_DIR}/demplxed_fastq/${OUTPUT_PREFIX}.demplxed.PE1.fastq.gz
68+
dex_fastq2=${OUTPUT_DIR}/demplxed_fastq/${OUTPUT_PREFIX}.demplxed.PE2.fastq.gz
69+
if [[ ! -f ${dex_fastq1} ]]; then
70+
ln -s ${fastqs[0]} $dex_fastq1
71+
ln -s ${fastqs[1]} $dex_fastq2
72+
fi
73+
prefix0=$(basename $dex_fastq1)
74+
prefix1=$(basename $dex_fastq2)
6375
if [ "$TRIM_METHOD" = 'Trimmomatic' ]; then
6476
echo "Using Trimmomatic ..."
6577
if [ -z $TRIMMOMATIC_PATH ]; then
@@ -69,7 +81,7 @@ else
6981

7082
trimmed_fastq1=${output_dir}/trimmed_paired_${prefix0}
7183
trimmed_fastq2=${output_dir}/trimmed_paired_${prefix1}
72-
java -jar ${TRIMMOMATIC_PATH}/*jar PE -threads 4 ${fastqs[0]} ${fastqs[1]} \
84+
java -jar ${TRIMMOMATIC_PATH}/*jar PE -threads 4 $dex_fastq1 $dex_fastq2 \
7385
${output_dir}/trimmed_paired_${prefix0} ${output_dir}/trimmed_unpaired_${prefix0} \
7486
${output_dir}/trimmed_paired_${prefix1} ${output_dir}/trimmed_unpaired_${prefix1} \
7587
ILLUMINACLIP:${ADAPTER_SEQ}:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:25
@@ -81,19 +93,17 @@ else
8193
echo "Using trim_galore ..."
8294
unset PYTHONHOME
8395
unset PYTHONPATH
84-
#dfastq1_pre=`echo $prefix0 | awk -F. '{print $1}'`
85-
#dfastq2_pre=`echo $prefix1 | awk -F. '{print $1}'`
96+
97+
${TRIM_GALORE_PATH}/trim_galore -j 4 -o $output_dir $dex_fastq1 $dex_fastq2 --paired --gzip --path_to_cutadapt ${CUTADAPT_PATH}/cutadapt
98+
8699
trimmed_fastq1=${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.demplxed.PE1_val_1.fq.gz
87100
trimmed_fastq2=${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.demplxed.PE2_val_2.fq.gz
88101

89-
${TRIM_GALORE_PATH}/trim_galore -j 4 -o $output_dir ${fastqs[0]} ${fastqs[1]} --paired --gzip --path_to_cutadapt ${CUTADAPT_PATH}/cutadapt
90-
91102
mv $trimmed_fastq1 ${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.trimmed.demplxed.PE1.fastq.gz
92103
mv $trimmed_fastq2 ${OUTPUT_DIR}/trimmed_fastq/${OUTPUT_PREFIX}.trimmed.demplxed.PE2.fastq.gz
93104
echo "Trimming Done!"
94105
else
95106
echo "You have not specify TRIM_METHOD, so I do not trim the reads"
96107
fi
97108

98-
99109
fi

0 commit comments

Comments
 (0)