This repository contains code to reproduce the results presented in the paper "Matrix linear models for high-throughput chemical genetic screens".
Analysis was primarily performed in Julia1, and visualizations were created using R2. The Julia package associated with this paper is GeneticScreens
, which extends the MatrixLM
package.
The genetic screening data from Nichols et al. (2011)3 used for analysis is available here. Once downloaded, it should be saved in the data/raw_KEIO_data/
directory.
Running auxotroph.R
additionally requires downloading the following tables and saving them as CSVs in the data/
directory.
- Supplemental Table 4 in Nichols et al. (2011)3
- Supplemental Table 1 in Joyce et al. (2006)4
- Supplemental Table 3 in Kritikos et al. (2017)5
preprocess.R
: Preprocess data from Nichols et al. (2011)3. Requires the data downloaded and saved todata/raw_KEIO_data/
, and should be run before any of the other files.
compare_times.jl
: Compare the runtimes for matrix linear models and Collins et al. (2006)6's S scores.
-
dosage.jl
: Run matrix linear models (dosage-response and condition-concentrations) and S scores (condition-concentrations) on Nichols et al. (2011)3's data. -
auxotroph.R
: Reproduce plots to check for auxotrophs against the lists provided by Supplemental Table 4 in Nichols et al. (2011)3 and Supplemental Table 1 in Joyce et al. (2006)4, as well as analysis of Kritikos et al. (2017)5's S scores (Supplemental Table 3). -
dosage.R
: Reproduce plots of proportion of hits detected by dosage response approach compared to matrix linear models (condition-concentrations) and S scores.
-
sim.jl
: Run matrix linear models and S scores on simulated data. -
sim.R
: Reproduce ROC plots for comparing matrix linear models and S scores. -
sim_null.jl
: Permute simulated data to calculate Type I error for matrix linear models.
-
dosage_sim.jl
: Run matrix linear models (dosage-response) and S scores (condition-concentrations and conditions only) on simulated data. -
dosage_sim.R
: Reproduce ROC plots for comparing matrix linear models (dosage-response) and S scores (condition-concentrations and conditions only).
1. Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98.
2. R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
3. Nichols, R. J., Sen, S., Choo, Y. J., Beltrao, P., Zietek, M., Chaba, R., Lee, S., Kazmierczak, K. M., Lee, K. J., Wong, A., et al. (2011). Phenotypic landscape of a bacterial cell. Cell, 144(1):143–156.
4. Joyce, A. R., Reed, J. L., White, A., Edwards, R., Osterman, A., Baba, T., Mori, H., Lesely, S. A., Palsson, B. Ø., and Agarwalla, S. (2006). Experimental and computational assessment of conditionally essential genes in Escherichia coli. Journal of bacteriology, 188(23):8259–8271.
5. Kritikos, G., Banzhaf, M., Herrera-Dominguez, L., Koumoutsi, A., Wartel, M., Zietek, M., and Typas, A. (2017). A tool named iris for versatile high-throughput phenotyping in microorganisms. Nature microbiology, 2(5):17014.
6. Collins, S. R., Schuldiner, M., Krogan, N. J., and Weissman, J. S. (2006). A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome biology, 7(7):R63.