A package containing the necessary tools for the statistical analysis of untargeted and targeted metabolomics data.
cimcb requires:
- Python (>=3.5)
- Bokeh (>=1.0.0)
- Keras
- NumPy (>=1.12)
- Pandas
- SciPy
- scikit-learn
- Statsmodels
- TensorFlow
- tqdm
The recommend way to install cimcb and dependencies is to using conda
:
conda install -c cimcb cimcb
or pip
:
pip install cimcb
Alternatively, to install directly from github:
pip install https://github.com/cimcb/cimcb/archive/master.zip
For futher detail on the usage refer to the docstring.
- PLS_SIMPLS: Partial least-squares regression using the SIMPLS algorithm.
- PLS_NIPALS: Partial least-squares regression using the NIPALS algorithm.
- PCR: Principal component regression.
- PCLR: Principal component logistic regression.
- RF: Random forest.
- SVM: Support vector machine.
- NN_LinearLinear: 2 Layer linear-linear neural network.
- NN_LinearSigmoid: 2 Layer linear-sigmoid neural network.
- NN_SigmoidSigmoid: 2 Layer sigmoid-sigmoid neural network.
- boxplot: Creates a boxplot using Bokeh.
- distribution: Creates a distribution plot using Bokeh.
- pca: Creates a PCA scores and loadings plot using Bokeh.
- permutation_test: Creates permutation test plots using Bokeh.
- roc_plot: Creates a rocplot using Bokeh.
- scatter: Creates a scatterplot using Bokeh.
- scatterCI: Creates a scatterCI plot using Bokeh.
- kfold: Exhaustitive search over param_dict calculating binary metrics using k-fold cross validation.
- holdout: Exhaustitive search over param_dict calculating binary metrics using hold-out set.
- Perc: Returns bootstrap confidence intervals using the percentile boostrap interval.
- BC: Returns bootstrap confidence intervals using the bias-corrected boostrap interval.
- BCA: Returns bootstrap confidence intervals using the bias-corrected and accelerated boostrap interval.
- binary_metrics: Return a dict of binary stats with the following metrics: R2, auc, accuracy, precision, sensitivity, specificity, and F1 score.
- ci95_ellipse: Construct a 95% confidence ellipse using PCA.
- dict_mean: Calculate mean for all keys in dictionary.
- dict_median: Calculate median for all keys in dictionary.
- dict_perc: Calculate confidence intervals (percentile) for all keys in dictionary.
- dict_std: Calculate std for all keys in dictionary.
- knnimpute: kNN missing value imputation using Euclidean distance.
- load_dataCSV: Loads and validates the DataFile and PeakFile from CSV files.
- load_dataXL: Loads and validates the DataFile and PeakFile from a excel file.
- nested_getattr: getattr for nested attributes.
- scale: Scales x (which can include nans) with method: 'auto', 'pareto', 'vast', or 'level'.
- table_check: Error checking for DataTable and PeakTable (used in load_dataXL).
- univariate_2class: Creates a table of univariate statistics (2 class).
- wmean: Returns Weighted Mean. Ignores NaNs and handles infinite weights.
cimcb is licensed under the MIT license.
Professor David Broadhurst, Director of the Centre for Integrative Metabolomics & Computation Biology at Edith Cowan University. E-mail: d.broadhurst@ecu.edu.au