Python analysis of the LINCS L1000 data.
The repository consists of python notebooks which are executed in the following order:
api.ipynb
retreives metadata from the L1000 API. Retrieved data is converted into a dataframe and saved as a tsv. Files are created for perturbations, signatures, cells, and probes.database.ipynb
creates a SQLite database containing the metadata retrieved from the API. Data cleaning occurs here. The database resides atdata/l1000.db
but is ignored due to file size. However, the populated database is available on figshare.unichem.ipynb
maps compounds to external databases and adds the mapping to the database. See this comment for more information.chemical-similarity.ipynb
computes chemical similarities between compounds and adds these similarities to the database.consensi.ipynb
computes consensus signatures for each perturbagen. The following consensus files are created:
consensi-drugbank.tsv.bz2
with consensus signatures for each mapped drugbank compoundconsensi-knockdown.tsv.bz2
with consensus signatures for each gene knockdownconsensi-overexpression.tsv.bz2
with consensus signatures for each gene over-expressionconsensi-pert_id.tsv.bz2
with consensus signatures for each L1000 pert_id. This file is too large for GitHub (500 MB), but is available on figshare.
significance.ipynb
converts consensus z-scores into significant up/down-regulation values. The following files are created:
- DrugBank dysregulated genes (
dysreg-drugbank.tsv
) and counts (dysreg-drugbank-summary.tsv
) - Knockdown dysregulated genes (
dysreg-knockdown.tsv
) and counts (dysreg-knockdown-summary.tsv
) - Overexpression dysregulated genes (
dysreg-overexpression.tsv
) and counts (dysreg-overexpression-summary.tsv
) - All perturbagens dysregulated genes (
dysreg-pert_id.tsv.gz
) and counts (dysreg-pert_id-summary.tsv
)
See this comment for more information on steps 5 & 6.
Note: This is not an official LINCS L1000 repository. Users are warned that our modifications may have introduced errors or removed signal that was present the original data.
This repository depends on modzs.gctx
— a legacy probe × signature matrix of differential expression z-scores. Due to large file size (42.5 GB) this file is not uploaded to GitHub. To recreate this analysis rather than just use the results, users should retrieve modzs.gctx
from figshare and place it in the download
directory.
See the Transcriptional signatures of perturbation from LINCS L1000 section of the Rephetio manuscript for the final description of this work. Citations related to this repository are below:
-
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel Scott Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017-09-22) https://doi.org/cdfk
DOI: 10.7554/elife.26726 · PMID: 28936969 · PMCID: PMC5640425 -
Consensus signatures for LINCS L1000 perturbations
Daniel Himmelstein, Leo Brueggeman, Sergio Baranzini
Figshare (2016-03-08) https://doi.org/f3mqvs
DOI: 10.6084/m9.figshare.3085426.v1 -
dhimmel/lincs v2.0: Refined Consensus Signatures From Lincs L1000
Daniel Himmelstein, Leo Brueggeman, Sergio Baranzini
Zenodo (2016-03-08) https://doi.org/f3mqvr
DOI: 10.5281/zenodo.47223 -
Computing consensus transcriptional profiles for LINCS L1000 perturbations
Daniel Himmelstein, Caty Chung
ThinkLab (2015-03-26) https://doi.org/f3mqwc
DOI: 10.15363/thinklab.d43
Create the conda environment for this repository using:
conda env create --file environment.yml
All original content in this repository is released under CC0 1.0. LINCS data and derivatives are released under CC BY 4.0 — please refer to the LINCS data policy and attribute this repository and LINCS L1000.