1. Data sources

This task is based on publicly available sequencing data from a study of Alzheimer’s Disease and Down Syndrome (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA975472) .The dataset includes multiple samples under different conditions (AD vs Control) and was originally sequenced using Illumina paired-end 2×150. The subsampled FASTQs are stored in sc_ad_boilerplate/data/ upon running the pipeline and are used as the inputs for the workflow.

• BioProject: PRJNA975472
• GEO SuperSeries: GSE233208
• Paper: Miyoshi et al., Nature Genetics (2024)
• Assay: single-nucleus RNA-seq (10x)

Subset used:

SRR	Group
SRR24710554	Control
SRR24710556	AD
SRR24710558	Control
SRR24710560	AD

2. Repository layout:

sc-alzheimer-analysis/
├── README.md
├── LICENSE
└── sc_ad_boilerplate/
    ├── metadata.yaml
    ├── samples.tsv
    └── workflow/
        ├── 01_build_ref.sh
        ├── 02_kb_count.sh
        ├── 03_scanpy_analysis.py
		├── 04_question_answers.py
        └── environment.yml

3. Installation

Create and activate the environment:

Prefetch: download .sra locally, then convert to FASTQ
Stream: download and process directly without storing .sra locally

Create and activate environment

conda env create -f sc_ad_boilerplate/workflow/environment.yml
conda activate ad_snrna

# Download and convert to paired, subsampled FASTQ.gz (default: 10% subsample)
bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh

# stream directly from SRA without saving .sra locally
STREAM=1 bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh
# change subsample rate (e.g., 20%)
RATE=0.20 bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh

4. Pre-processing / subsampling

The script:

Converts .sra to paired FASTQ files
Subsamples reads in a paired-safe manner (default: 10%)
Stores them in:

sc_ad_boilerplate/data/fastq_sub/

Pipeline Execution

# defaults: RATE=0.10, THREADS=4
bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh

# if you only want to stream directly from SRA without storing .sra:
STREAM=1 bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh

# change subsample rate (e.g., 20%)
RATE=0.20 bash sc_ad_boilerplate/workflow/01_fetch_and_fastq.sh

5. Workflow

Step 1 – Build reference (kb ref)

Purpose: Prepare kallisto|bustools reference for mouse (Mus musculus, GRCm38, Ensembl 98)
Tools: kb ; Inputs: Ensembl FASTA + GTF (downloaded automatically) ; Outputs: workflow/ref/index.idx, t2g.txt, FASTA files

bash sc_ad_boilerplate/workflow/02_build_ref.sh

Step 2 – Quantification (kb count)

Purpose: Quantify transcripts from paired subsampled FASTQs ; Tools: kb (kallisto|bustools) ; Inputs: subsampled FASTQs from data/fastq_sub/ ; Outputs: matrices under workflow/kb_out/<SRR>/counts_unfiltered/workflow/kb_out//counts_unfiltered/

bash sc_ad_boilerplate/workflow/03_kb_count.sh

Step 3 – Analysis (Scanpy)

Purpose: Merge samples, perform QC, UMAP, clustering, and compare AD vs Control ; Tools: Scanpy (v1.10.2) ; Inputs: matrices from workflow/kb_out/*/counts_unfiltered/ ; Outputs: figures and tables in workflow/scanpy_out/

python sc_ad_boilerplate/workflow/04_scanpy_analysis.py

6. Primary outputs:

sc_ad_boilerplate/workflow/scanpy_out/umap_overview.png – UMAP of all cells
sc_ad_boilerplate/workflow/scanpy_out/cell_counts_by_sample_group.csv – cell counts per sample & group
sc_ad_boilerplate/workflow/scanpy_out/markers_per_cluster_wilcoxon.csv – cluster markers
sc_ad_boilerplate/workflow/scanpy_out/DE_AD_vs_Control_per_cluster.csv – DE results per cluster
sc_ad_boilerplate/workflow/scanpy_out/alz_snrna_merged.h5ad – merged AnnData object

Read counts check:

zcat sc_ad_boilerplate/data/fastq_sub/<SRR>_1.sub.fastq.gz | wc -l

Notes: This pipeline was run on Google colab with high RAM CPU (51 GB)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
sc_ad_boilerplate		sc_ad_boilerplate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Data sources

Subset used:

2. Repository layout:

3. Installation

Create and activate environment

4. Pre-processing / subsampling

Pipeline Execution

5. Workflow

Step 1 – Build reference (kb ref)

Step 2 – Quantification (kb count)

Step 3 – Analysis (Scanpy)

6. Primary outputs:

About

Uh oh!

Releases

Packages

Languages

License

pranathibinf/sn_RNA_analysis

Folders and files

Latest commit

History

Repository files navigation

1. Data sources

Subset used:

2. Repository layout:

3. Installation

Create and activate environment

4. Pre-processing / subsampling

Pipeline Execution

5. Workflow

Step 1 – Build reference (kb ref)

Step 2 – Quantification (kb count)

Step 3 – Analysis (Scanpy)

6. Primary outputs:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages