Data repository for "An in vitro model of tumor heterogeneity resolves genetic, epigenetic, and stochastic sources of cell state variability," Hayford et al. (2021), PLoS Biology 19 : e3000797; DOI: 10.1371/journal.pbio.3000797
*Instructions for creating panels in all main and supplementary figures based on experimental and simulated data in this repository
-
-
Panels A and C: In the DrugResponse directory, run
DrugResponse.R
, which pulls data from the twoParental-*.csv
files in the directory and the well conditions in the DrugResponse/Platemaps subdirectory.Panels B and D: In the cFP directory, run
cFP.R
, which pulls data from the 10cFP_*.csv
files in the directory. -
Panel A: In the WES directory, run
WES.R
, which pulls data frommutations_byChromosome.csv
.Panels B, C, and D: In the WES directory, run
WES.R
, which pulls data from thevep_*.txt
files in the directory and uses the database in the RData object inRefCDS_human_GRCH38.p12.rda
to cross-reference variants. NOTE: Thevep_*.txt
files must be manually unzipped before runningWES.R
.Panel E: In the scRNAseq/inferCNV subdirectory, run
inferCNV.R
, which pulls a counts matrix from the RData object inPC9.CLV.10x.counts.matrix.rds
, included in the directory. Necessary annotation and gene order files are also provided.Panel F: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. Scripts to de-multiplex hashed raw data and outputs are included in the scRNAseq/HTO_identification subdirectory. A full matrix of de-multiplexed counts is included asPC9_scRNAseqCounts_HTOdemux.csv.zip
.Panel G: In the GO directory, run
GO_correlation.R
, which pulls data frommutations_DEGs-hg38.RData
, a file that compiles all IMPACT genetic mutations (from the WES directory) and differentially expressed genes (DEGs; from the scRNAseq directory). -
Panel A: In the WES folder, run
WES.R
, which pulls data frommutations_byChromosome.csv
.Panels B, C, and D: In the WES directory, run
WES.R
, which pulls data from thevep_*.txt
files in the directory and uses the database in the RData object inRefCDS_human_GRCH38.p12.rda
to cross-reference variants.Panel E: In the scRNAseq/inferCNV subdirectory, run
inferCNV.R
, which pulls a counts matrix from the RData object inPC9.VUDS.10x.counts.matrix.rds
(created ininferCNV.R
). Necessary annotation and gene order files are also provided.Panel F: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. Scripts to de-multiplex hashed raw data and outputs are provided in the scRNAseq/HTO_identification subdirectory. A full matrix of de-multiplexed counts is included asPC9_scRNAseqCounts_HTOdemux.csv.zip
.Panel G: In the GO folder, run
GO_correlation.R
, which pulls data frommutations_DEGs-hg38.RData
, a file that compiles all IMPACT genetic mutations (from the WES directory) and differentially expressed genes (DEGs; from the scRNAseq directory). -
Panels A and E: In the cFP directory, run
cFP.R
, which pulls data from thetrajectories_*.csv
files in the directory.Panels B and F: In the cFP directory, run
cFP.R
, which pulls simulated data from thetrajectories_*.csv
files in the directory. Model trajectories are representative examples of a larger simulation scan (*.py
models in the Simulations directory).Panels C and G: In the cFP directory, run
cFP.R
, which pulls simulated data from thedistributions_*.csv
files in the directory. Model distributions were calculated from example trajectories as part of a larger simulation scan (*.py
models in the Simulations directory). For each subline, the mean and confidence interval reported on the plot is calculated based on 100 bootstrapped p-values provided in one of theADbootstrap*.csv
files.Panels D and H: In the Simulations directory, run
plotParameterScan.R
, which pulls data from the*_lowVal.csv
files in the directory.
-
-
Panel A: Screenshot of the EGFR gene from the Integrative Genomics Viewer (IGV) based on raw exome sequencing data (available in the Sequence Read Archive (SRA) at accession #PRJNA632351). Image is stored as
PC9-EGFRgene_mutations_ex19delCommon.svg
in the WES directory.Panel B: N/A
-
Panels A, B, and C: In the cFP directory, run
cFP.R
, which pulls data from thetrajectories_*.csv
files in the directory. Data from overlays in panel C come from thePopD_trajectories.RData
object. -
Panel A: In the WES directory, run
WES.R
, which pulls data fromnumber_mutations.csv
in the directory.Panel B: In the WES directory, run
WES.R
, which pulls data fromsamples_called_vars_named.vcf.gz
in the directory. Directions to download reference FASTA and GTF files are provided inWES.R
.Panel C: In the WES directory, run
WES.R
, which pulls data fromshared_variants_CLV.csv
in the directory.Panel D: In the WES directory, run
WES.R
, which pulls data fromshared_variants_sublines.csv
in the directory.Panel E: In the WES directory, run
WES.R
, which pulls data fromshared_variants_VUDSlines.csv
in the directory. -
Panels A and B: In the WES directory, run
WES.R
, which pulls data from thevep_*.txt
files in the directory and uses the database in the RData object inRefCDS_human_GRCH38.p12.rda
to cross-reference variants. -
Panel A: Screenshot of the summarized output from the Cell Ranger quality control analysis on the scRNA-seq library (available in the Gene Expression Omnibus (GEO) data repository at accession #GSE150084). Settings are shown in the image, which is stored as
CellRanger_PC9.svg
in the scRNAseq directory.Panel B: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. -
Panels A and B: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories and subsets data by cell line versions.Panels C and D: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories and subsets data by sublines.Panels E and F: In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. -
Panels A and B: In the RNAseq directory, run
RNAseq.R
, which pulls from all 8*_featurecounts.txt
files in the directory. These files were created using the Bash script inRNAseq_processing.txt
. NOTE: The*_featurecounts.txt
files must be manually unzipped before runningRNAseq.R
. -
In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. Input hallmark gene signature (.gmt
) files can be found in the scRNAseq/VISION_gmt/hallmark subdirectory. -
In the GO directory, run
semanticSimilarity.R
, which pulls data frommutations_DEGs-hg38.RData
, a file that compiles all IMPACT genetic mutations (from the WES directory) and differentially expressed genes (DEGs; from the scRNAseq directory). Directions for downloading reference GTF file are provided insemanticSimilarity.R
. -
In the scRNAseq/inferCNV directory, run
inferCNV.R
, which pulls a counts matrix from the RData object inPC9.VUDS.10x.counts.matrix.rds
(created ininferCNV.R
). Necessary annotation and gene order files are also provided in the directory. -
Panel A: In the cFP directory, run
cFP.R
, which pulls data from thetrajectories_*.csv
files in the directory.Panel B: In the cFP directory, run
cFP.R
, which pulls data from thetrajectories_*.csv
files in the directory. Model trajectories are representative examples of a larger simulation scan (*.py
models in the Simulations directory).Panel C: In the cFP directory, run
cFP.R
, which pulls simulated data from thedistributions_*.csv
files in the directory. Model distributions were calculated from example trajectories as part of a larger simulation scan (*.py
models in the Simulations directory). For each subline, the mean and confidence interval reported on the plot is calculated based on 100 bootstrapped p-values provided in one of theADbootstrap*.csv
files.Panel D: In the Simulations directory, run
plotParameterScan.R
, which pulls from the*_lowVal.csv
files in the directory. -
Panels A and B: In the WES directory, run
WES.R
, which pulls data fromsamples_called_vars_named.vcf.gz
in the directory. -
In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories. -
In the scRNAseq directory, run
scRNAseq.R
, which pulls from 10x Genomics reduced data in the scRNAseq/read_count and scRNAseq/umi_count subdirectories.
-