scHPF is a tool for de novo discovery of both discrete and continuous expression patterns in single-cell RNA-sequencing (scRNA-seq). We find that scHPF’s sparse low-dimensional representations, non-negativity, and explicit modeling of variable sparsity across genes and cells produce highly interpretable factors.
- Documentation
- Changelog
- Paper at Molecular Systems Biology
- Application to human tissue T cells across multiple donors and tissues
scHPF requires Python >= 3.6 and the packages:
- numba (version needed depends on Python version, but should be safe with 0.45)
- scikit-learn
- pandas
- (optional) loompy
The easiest way to setup an environment for scHPF is with the Anaconda Python distribution in Miniconda or anaconda:
conda create -n schpf_p37 python=3.7 scikit-learn numba=0.50 pandas
# for newer anaconda versions
conda activate schpf_p37
# XOR older anaconda verstions
source activate schpf_p37
# Optional, for using loom files as input to preprocessing
pip install -U loompy
Once you have set up the environment, clone this repository and install.
git clone git@github.com:simslab/scHPF.git
cd scHPF
pip install .
This step important because not all micro-versions of numba play nicely with all micro versions of Python or numpy, and sometimes issues vary across machines. Testing will catch some but not all such issues. From the scHPF base directory do:
conda install pytest
pytest
Please get in touch if tests fail, or if you get segmentation faults or very long train times that and no automatic parallelization, and I'm happy to try to help.
scHPF has a scikit-learn like API. Trained models are stored in a serialized joblib format.
If you have any questions/errors/issues, please open an issue and I be happy to to provide whatever help and guidance I can.
Contributions to scHPF are welcome. Please get in touch if you would like to discuss/check it's something I've already done but haven't pushed to master yet. To contribute, please fork scHPF, make your changes, and submit a pull request.
Hanna Mendes Levitin, Jinzhou Yuan, Yim Ling Cheng, Francisco JR Ruiz, Erin C Bush, Jeffrey N Bruce, Peter Canoll, Antonio Iavarone, Anna Lasorella, David M Blei, Peter A Sims. "De novo gene signature identification from single‐cell RNA‐seq with hierarchical Poisson factorization." Molecular Systems Biology, 2019. [Open access article]
Peter A. Szabo*, Hanna Mendes Levitin*, Michelle Miron, Mark E. Snyder, Takashi Senda, Jinzhou Yuan, Yim Ling Cheng, Erin C. Bush, Pranay Dogra, Puspa Thapa, Donna L. Farber, Peter A. Sims. "Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease." Nature Communications, 2019. [Open access article] * Co-first authors