Skip to content

Latest commit

 

History

History
59 lines (53 loc) · 3.7 KB

10X_cell_ranger_alignment.md

File metadata and controls

59 lines (53 loc) · 3.7 KB

10X Cell Ranger Alignment Workflow

The workflow script that runs the tools is workflows/kf_cell_ranger_10x_alignment.cwl

The workflow runs cellranger count, on fastq files generated by the 10x single cell RNA workflow methodology. Cell ranger count performs alignment, barcode counting, and filtering. A custom QC R markdown notebook developed by @AntoniaChroni is also run, which includes as it's main engine Seurat and scooter

Software

  • Cellranger 6.1.2
  • Seurat 4.3.0.1
  • miQC 1.10.0

Inputs

multi-step

  • output_basename: basename used to name output files
  • sample_name: used as prefix for finding fastqs to analyze, e.g. 1k_PBMCs_TotalSeq_B_3p_LT_antibody if the names of the underlying fastqs are of the form 1k_PBMCs_TotalSeq_B_3p_LT_antibody_S1_L001_I1_001.fastq.gz, one per input fastq in the same order

optional concat and rename step

  • corrected_read_1_name: corrected read one names in the 10x expected format 'SampleName_S1_L001_R1_001'. When provided, must be in the same order and same length as the sample name and corrected_read_2_name arrays.
  • corrected_read_2_name: corrected read two names in the 10x expected format 'SampleName_S1_L001_R2_001'. When provided, must be in the same order and same length as the sample name and corrected_read_1_name arrays.

cell ranger

  • cr_localcores: Num cores to use for cell ranger, default: 36
  • cr_instance_ram: Ram in GB to make available to cell ranger count step, default: 64
  • fastq_dir: directory of fastqs being run. If formatting needed, use r1 and r2 fastqs input instead
  • r1_fastqs: If fastqs need to be concat from an old format, populate this
  • r2_fastqs: If fastqs need to be concat from an old format, populate this
  • reference: directory of reference files
  • no_bam: Set to skip generating bam output. Good to keep bam for troubleshooting, but adds to computation time
  • chemistry:
    • auto: for auto-detection (default)
    • threeprime: for Single Cell 3′
    • fiveprime: for Single Cell 5′
    • SC3Pv2: for Single Cell 3′ v2
    • SC3Pv3: for Single Cell 3′ v3
    • SC3Pv3LT: for Single Cell 3′ v3 LT
    • SC3Pv3HT: for Single Cell 3′ v3 HT
    • SC5P-PE: for Single Cell 5′ paired-end (both R1 and R2 are used for alignment)
    • SC5P-R2: for Single Cell 5′ R2-only (where only R2 is used for alignment)
    • SC3Pv1: for Single Cell 3′ v1. NOTE: this mode cannot be auto-detected. It must be set explicitly with this option
    • ARC-v1: for analyzing the GEX portion of multiome data. NOTE: this mode cannot be auto-detected

seurat qc

  • seurat_qc_min_genes: minimum number of genes per cell
  • seurat_qc_max_mt: maximum percent mitochondrial reads per cell. Fallback metric for miQC failure
  • seurat_qc_normalize_method: normalization method. One of log_norm or sct
  • seurat_qc_num_pcs: number of PCs to calculate

Outputs

  • bam_out: BAM generated by Cellranger Count
  • cellranger_matrix_raw: Raw feature matrix file from Cellranger
  • cellranger_matrix_filtered: Filtered feature matrix file from Cellranger
  • cellranger_cluster: CSV containing cluster information from Cellranger
  • debug_cr_file_outputs: TAR.GZ file of the output directory produced by Cellranger Count
  • seurat_qc_html: HTML of QC metrics generated by Seurat
  • seurat_qc_rds: QC rds. See docs for detailed contents of object
  • seurat_raw_rds: Seurat object of original input counts rds

QC RDS Output (seurat_qc_rds)

Given that this is a complex Seurat Object rds file, we have a separate doc outlining it's output here