-
Notifications
You must be signed in to change notification settings - Fork 0
Visualization of the statistical hypothesis test between two or more groups of categorical or numerical data.
License
shhschilling/visStatistics
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
--- output: rmarkdown::html_document editor_options: markdown: wrap: 72 --- <!-- README.md is automatically generated from README.Rmd. Please only edit this Rmd file! --> <!-- knitr before every resubmission --> ```{r, include = FALSE} knitr::opts_chunk$set( echo = TRUE, collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5, out.width = "100%", fig.path = "man/figures/README-" ) ``` # visStatistics Visualization of the statistical hypothesis test with the highest statistical power between two groups of categorical or numerical data. The package visStatistics with its core function `visstat()` allows a fast visualization and a reproducible statistical analysis of the presented data. Based on a decision tree, it selects the statistical hypothesis test with the highest statistical power between the dependent variable (response) `varsample` and the independent variable (feature) `varfactor`. The corresponding test statistics, including any post-hoc-analysis, are returned and a graph is generated showing the key statistics of the test. This fully automated workflow is particularly suited to browser-based interfaces to server-based deployments of R and data bases, and has been successfully implemented for unbiased statistical analysis of medical data sets. A detailed description of the package the and its underlying decision tree, can be found in the `vignette` accompanying this package. ## Implemented tests `lm()`, `t.test()`, `wilcox.test()`, `aov()`, `kruskal.test()`, `fisher.test()`, `chisqu.test()` ### Implemented tests to check the normal distribution of standardized residuals `shapiro.test()` and `ad.test()` ### Implemented post-hoc tests `TukeyHSD()` for `aov()`and `pairwise.wilcox.test()` for `kruskal.test()` ## Installation latest stable version from CRAN 1. Install the package `install.packages("visStatistics")` 2. Load the package `library(visStatistics)` ## Installation of developing version from GitHub 1. Install the devtools package from CRAN. Invoke R and type `install.packages("devtools")` 2. Load the devtools package. `library(devtools)` 3. Install the package from the github-repository `install_github("shhschilling/visStatistics")` 4. Load the package `library(visStatistics)` 5. Help on the function usage `?visstat` ## Getting Started The package vignette allows you to get familiar with all features of `visStatistics`. It documents in detail the algorithm of the decision tree illustrated by below examples. ## Examples ```{r example} library(visStatistics) ``` ### Welch's t-test #### InsectSprays data set ```{r} insect_sprays_a_b <- InsectSprays[which(InsectSprays$spray == "A" | InsectSprays$spray == "B"), ] insect_sprays_a_b$spray <- factor(insect_sprays_a_b$spray) visstat(insect_sprays_a_b, "count", "spray") ``` #### mtcars data set ```{r} ``` ```{r} mtcars$am <- as.factor(mtcars$am) t_test_statistics <- visstat(mtcars, "mpg", "am") ``` ### Wilcoxon rank sum test ```{r} grades_gender <- data.frame( sex = as.factor(c(rep("girl", 21), rep("boy", 23))), grade = c( 19.3, 18.1, 15.2, 18.3, 7.9, 6.2, 19.4, 20.3, 9.3, 11.3, 18.2, 17.5, 10.2, 20.1, 13.3, 17.2, 15.1, 16.2, 17.0, 16.5, 5.1, 15.3, 17.1, 14.8, 15.4, 14.4, 7.5, 15.5, 6.0, 17.4,7.3, 14.3, 13.5, 8.0, 19.5, 13.4, 17.9, 17.7, 16.4, 15.6, 17.3, 19.9, 4.4, 2.1 ) ) wilcoxon_statistics <- visstat(grades_gender, "grade", "sex") ``` ### ANOVA ```{r} insect_sprays_tr <- InsectSprays insect_sprays_tr$count_sqrt <- sqrt(InsectSprays$count) visstat(insect_sprays_tr, "count_sqrt", "spray") ``` ### One-way test ```{r} one_way_npk <- visstat(npk, "yield", "block") ``` ### Kruskal-Wallis test The generated graphs can be saved in all available formats of the `Cairo` package. Here we save the graphical output of type "pdf" in the `plotDirectory` `tempdir()`: ```{r} visstat(iris, "Petal.Width", "Species", graphicsoutput = "pdf", plotDirectory = tempdir()) ``` ### Linear Regression ```{r} linreg_cars <- visstat(cars, "dist", "speed") ``` Increasing the confidence level `conf.level` from the default 0.95 to 0.99 leads two wider confidence and prediction bands: ```{r pressure, echo = FALSE} linreg_cars_99 <- visstat(cars, "dist", "speed", conf.level = 0.99) ``` ### Pearson's Chi-squared test Count data sets are often presented as multidimensional arrays, so-called contingency tables, whereas `visstat()` requires a `data.frame` with a column structure. Arrays can be transformed to this column wise structure with the helper function `counts_to_cases()`: ```{r} hair_eye_color_df <- counts_to_cases(as.data.frame(HairEyeColor)) visstat(hair_eye_color_df, "Hair", "Eye") ``` ### Fisher's exact test ```{r} hair_eye_color_male <- HairEyeColor[, , 1] # Slice out a 2 by 2 contingency table black_brown_hazel_green_male <- hair_eye_color_male[1:2, 3:4] #Transform to data frame black_brown_hazel_green_male <- counts_to_cases(as.data.frame(black_brown_hazel_green_male)) # Fisher test fisher_stats <- visstat(black_brown_hazel_green_male, "Hair", "Eye") ```
About
Visualization of the statistical hypothesis test between two or more groups of categorical or numerical data.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published