This project contains the full single-cell RNA sequencing (scRNA-seq) analysis of Parhyale hawaiensis embryos. The dataset includes both intact and ablated embryos sampled at the beginning, middle, and end of the regenerative replacement process. The analysis is implemented in Python using scanpy, with preprocessing and alignment in Bash using STARsolo.
scParhyale/ ├── data/ # Raw and processed count matrices │ ├── raw/ # Filtered STARsolo outputs (barcodes, genes, matrix) │ └── processed/ # Final AnnData objects (.h5ad) ├── metadata/ # Sample annotations (e.g. library, stage, genotype) ├── notebooks/ # Python notebooks for downstream analysis ├── results/ # QC plots, analysis outputs ├── scripts/ # Bash scripts used for preprocessing (e.g. STARsolo) ├── env/ # Conda environment setup ├── README.md └── .gitignore
- STAR v2.7.11a for genome indexing and alignment
- STARsolo for single-cell quantification
- PIPseq T10 v5.0 kit (3′ scRNA-seq, paired-end 2×50 bp)
- scanpy for downstream Python analysis (to be developed)
- Read 1 contains:
- 16 bp cell barcode (
CB:1-16) - 12 bp UMI (
UB:17-28)
- 16 bp cell barcode (
- Read 2 contains the transcript sequence
- Sequencing depth and layout as expected from Illumina NovaSeq 6000
- Make project folders (optional but tidy)
mkdir -p ~/workspace/scRNAseq/tools
cd ~/workspace/scRNAseq/tools
- Download & compile STAR locally
wget https://github.com/alexdobin/STAR/archive/refs/tags/2.7.11a.tar.gz
tar -xvzf 2.7.11a.tar.gz
cd STAR-2.7.11a/source
make STARThis will create the STAR executable in:
~/workspace/scRNAseq/tools/STAR-2.7.11a/source/STAR
You can test it doing
./STAR- Add it to your PATH (optional but useful)
Edit your ~/.bashrc (or ~/.zshrc if using zsh):
nano ~/.zshrcAdd this line at the end:
export PATH="$HOME/workspace/scRNAseq/tools/STAR-2.7.11a/source:$PATH"then reload:
source ~/.zshrc- Test
STAR --versionThe reference genome was indexed using STAR with the following files:
- Genome FASTA:
genomeV5.fa - Gene annotations:
- Old
Gene_Models_210210.gtf - New (Matilde Paris)
Annotation_Reg_Embryo_500_cleanOverlap0.5_3_restranded_exp345.gtf
- Old
Command used (on a high-RAM workstation):
~/workspace/scRNAseq/tools/STAR-2.7.11a/source/STAR \
--runThreadN 60 \
--runMode genomeGenerate \
--genomeDir ~/workspace/scRNAseq/genome_index \
--genomeFastaFiles ~/workspace/scRNAseq/genome/genomeV5.fa \
--sjdbGTFfile ~/workspace/scRNAseq/genome/Annotation_Reg_Embryo_500_cleanOverlap0.5_3_restranded_exp345.gtf \
--sjdbOverhang 59 \
--limitGenomeGenerateRAM 480000000000
• sjdbOverhang = read length - 1 (we used 59 for 60 bp reads)
• Output was saved to genome_index/
##🎯 Read alignment with STARsolo
Read alignment and quantification were performed using STARsolo in CB_UMI_Complex mode with the following parameters: • --soloType CB_UMI_Complex • --soloCBstart 1 --soloCBlen 16 • --soloUMIstart 17 --soloUMIlen 12 • --soloFeatures Gene • Filtered matrix output was written to Solo.out/Gene/filtered/ for each library.
To automate the alignment of all libraries (lib01 to lib06), we created a Bash script:
📄 scripts/run_STARsolo_all.sh
This script loops through all FASTQ files and runs STARsolo using the indexed genome and appropriate parameters.
Remember to give permissions before running and making sure all paths are correctly specified
chmod +x ~/workspace/scRNAseq/code/run_STARsolo_all.shIn our workstation it took about 40h to complete with the above specified resources. Run with nohup so that you can disconect from the worktation or close the terminal and it will continue.
nohup ~/workspace/scRNAseq/code/run_STARsolo_all.sh > ~/workspace/scRNAseq/code/run_log.txt 2>&1 &You can later check the progress...
tail -f ~/workspace/scRNAseq/code/run_log.txtIt will say sth like... processing library 1
##👤 Author
Miquel Sendra Guignard Lab, IBDM (Marseille) 📧 miquel.sendra-ortola@univ-amu.fr 🔬 Project with Tassos Pavlopoulos and Léo Guignard