This repository comes as a rearrangement of all the work done during my MSc in Mathematical Engineering @ Politecnico di Milano. The purpose of the repository is to put at reach an easy-to-use collection of statistical methods and techniques. Out of scope, on the other hand, is to address all the problems that these methods and techniques may encounter in a real-world data contest. In fact, all the datasets used are toy datasets, used only to introduce an application.
All the code is written in R
.
A python
correspondent will also exist in the future.
Mainly, the collection is divided into two sections:
- Standard Statistics - parametric statistics, classical approaches with, often, strong assumptions on data
- Nonparametric Statistics - modern approaches, free from heavy assumptions on data
I uploaded the files in .r
(script version, easy to download and use directly with a custom dataset), in .rmd
and in .html
(for visualization purposes). Since GitHub does not provide a preview for .rmd
(nor .html
), I made use of an extension that allows the viewing of .html
.
The files are viewable (code chuncks, outputs and plots) at the links below.
- 01 - PCA
- 02 - Multivariate Gaussian, One Population - Test and CR for the Mean
- 03 - Paired Gaussian Data - Test for the Mean
- 04 - Repeated Measures
- 05 - Multivariate Gaussian, Two Populations - Test for the Mean
- 06 - One-way ANOVA (p=1, g=6)
- 07 - One-way MANOVA (p=4, g=3)
- 08 - Two-way ANOVA (p=1, g=2, b=2)
- 09 - Two-way MANOVA (p=3, g=2, b=2)
- 10 - Supervised learning - LDA (univariate, bivariate), QDA (bivariate), KNN, Fisher’s argument
- 11 - Unsupervised learning - Hierarchical, K-means clustering
- 12 - Linear Models - Model, IC, IP, PCA Regression, Ridge Regression, Lasso Regression
- 13 - Linear Models - Variables Selection
- 14 - Functional Data Analysis (FDA)
- 15 - Functional Data Analysis (FDA) - Example
- 16 - Geostatistics
- 01 - Depth Measures
- 02 - Sign Test
- 03 - Rank Test (Mann-Withney U Test)
- 04 - Signed Rank Test (Wilcoxon Signed Rank W Test)
- 05 - Permutation Test - Two Independent Samples (1-dim)
- 06 - Permutation Test - Two Independent Samples (n-dim)
- 07 - Permutation Test - Center of Simmetry
- 08 - Permutation Test - Regression
- 09 - Permutation Test - ANOVA, MANOVA
- 10 - Permutation Test - Confidence Intervals
- 11 - Bootstrap - Confidence Intervals
- 12 - Bootstrap - Regression
- 13 - Bootstrap - Two Independent Samples (1-dim)
- 14 - Bootstrap - One Sample (n-dim)
- 15 - Bootstrap - Test and p-values
- 16 - Nonparametric Regression
- 17 - Splines
- 18 - GAMs
- 19 - Full Conformal Prediction
- 20 - Split Conformal Prediction
- 21 - Conformal Prediction Intervals
- 22 - Survival Analysis
If any code is not working, or if any dataset is missing let me know.
As a python fan, I plan to "translate" the work into python language (where I know that, with the right libraries it will take much less lines of code 😉). In addition, it would also be interesting to push beyond toy datasets and try some real-world applications.