A collection of scripts for common procedures (e.g. PCA)
Various and this will probably get outdated quickly. Please see the individual script requirements.
See also requirements files and Dockerfile for more information.
Most of the scripts are R or Python though so at the least you'll need:
- R >= 3.2
- Python >= 3.5
- r-docopt
- r-data.table
- r-ggplot2
# You may want to create a specific environment with conda first, then run:
pip install git+git://github.com/EpiCompBio/stats_utils.git
# Create a folder or a whole data science project, e.g. project_quickstart -n my_project
cd my_project/results
mkdir tests
cd tests
# You may need to install missing dependencies, e.g.:
conda install r-docopt r-data.table r-ggplot2 r-cowplot r-ggthemes
# Simulate some data:
simulate_cont_var.py -h
simulate_cont_var.py --createDF --sample-size=1000 --var-size=50 -O cont_var_sim_data
# The file will have rows as features/variables and columns as
# samples/individuals. Transpose it for prcomp in run_PCA:
transpose.R -I cont_var_sim_data.tsv
# Run principal components:
run_PCA.R -h
run_PCA.R -I cont_var_sim_data.transposed.tsv
# Check the outputs:
head cont_var_sim_data* | cut -f1-5
open top_10_PCs_cont_var_sim_data.transposed.pca.svg
Pull requests welcome!
If you have any issues, pull requests, etc. please report them in the issue tracker.