Data package for sarcoma RNA-seq data from PRJNA282597
- Experimental data were generated by Lesluyes et al. Original citations:
- Lesluyes T, Pérot G, Largeau MR, Brulard C et al. RNA sequencing validation of the Complexity INdex in SARComas prognostic signature. Eur J Cancer 2016 Apr;57:104-11. PMID: 26916546
- Lesluyes T, Baud J, Pérot G, Charon-Barra C et al. Genomic and transcriptomic comparison of post-radiation versus sporadic sarcomas. Mod Pathol 2019 Dec;32(12):1786-1794. PMID: 31243333
- Lesluyes T et al., Genomic and transcriptomic comparison of post-radiation versus sporadic sarcomas., Mod Pathol, 2019 Dec;32(12):1786-1794
- Processing:
- Sequencing reads were downloaded from SRA, at PRJNA282597
- Quantification was done by 2 alternative workflows:
- Using Kallisto 0.45.0 with an index built from Human genome GRCh38.99 and 92 ERCC sequences
- Using STAR 2.7.1a to align against the Gencode human genome v27, GRCh38.p10 and 92 ERCC sequences, and RSEM to estimate abundance levels for genes/isoforms.
- Metadata: compiled from SRA, GEO soft-formatted file, plus extracted information from the sequence identifiers in fastq files.
Install the package, import the library and load the ExpressionSet
of interest, for example
devtools::install_github('ttdtrang/data-rnaseq-sarcoma')
data(sarcoma.rnaseq.gene, package='data.rnaseq.sarcoma')
dim(sarcoma.rnaseq.gene.kallisto@assayData$exprs)
The package includes 4 data sets.
sarcoma.rnaseq.gene.kallisto
sarcoma.rnaseq.transcript.kallisto
sarcoma.rnaseq.gene.star_rsem
sarcoma.rnaseq.transcript.star_rsem
cd data-raw
- Download all necessary raw data files.
- Set the environment variable
DBDIR
to point to the path containing said files. It is assumed that files are organized into directories corresponding to workflow, e.g.
├── kallisto
│ ├── feature_attributes.tsv
│ ├── matrix.est_counts.RDS
│ ├── matrix.gene.est_counts.RDS
│ ├── matrix.gene.tpm.RDS
│ └── matrix.tpm.RDS
├── PRJNA282597_metadata_cleaned.tsv
├── fastq_metadata.tsv
└── star-rsem
├── feature_attrs.rsem.transcripts.tsv
├── matrix.gene.expected_count.RDS
├── matrix.gene.tpm.RDS
├── matrix.transcripts.expected_count.RDS
├── matrix.transcripts.tpm.RDS
└── starLog.final.tsv
- Run the R notebook
make-data-package.Rmd
to assemble parts intoExpressionSet
objects.