This anlysis extends some of my previous work, developing genotype to phenotype prediction methods based on omics data from 1,011 S. cerevisiae strains (see Peter et al. 2018). The transcriptomics data was provided by the Schacherer lab and the proteomics data by the Ralser lab. Neither dataset is currently available. The growth phenotypes were measured by members of the Beltrao lab, where I performed this work, as detailed in Galardini et al. (2019).
This repo contains several phenotype prediction analyses, based on genotype (modelled with P(Aff) scores), gene expression and abundance scores, expressed as fold changes compared to that genes median expression:
- Associations and correlations between P(Aff), expression and abundance, showing generally weak relationships.
- Phenotype prediction based on linear models using the first 50 PCs of the P(Aff), abundance and expression scores.
- Variational Auto-Encoder based linear phenotype prediction models, using a custom VAE implementation.
- Gene based linear models, assessing the strength of association between each gene and phenotype, based on genotype, expression and abundance.
The project is split as follows:
bin
:analysis
- Scripts performing data analysis and figure generationprocessing
- Scripts to parse, format and normalise the raw datautil
- Additional utility scripts
docs
- Two lists of genes identifiedsrc
- shared modules and R config