Skip to content

Visualization of the statistical hypothesis test between two or more groups of categorical or numerical data.

License

Notifications You must be signed in to change notification settings

shhschilling/visStatistics

Repository files navigation

---
output: rmarkdown::html_document
editor_options: 
  markdown: 
    wrap: 72
---

<!-- README.md is automatically generated from README.Rmd. Please only edit this Rmd file! -->

<!-- knitr before every resubmission -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 5,
  out.width = "100%",
  fig.path = "man/figures/README-"
)
```

# visStatistics

Visualization of the statistical hypothesis test with the highest statistical
power between two groups of categorical or numerical data.

The package visStatistics with its core function `visstat()` allows a fast visualization
and a reproducible statistical analysis of the presented data. Based on a decision tree, it selects the statistical hypothesis test
with the highest statistical power between the dependent variable
(response) `varsample` and the independent variable (feature)
`varfactor`. The corresponding test statistics, including any
post-hoc-analysis, are returned and a graph is generated showing the key statistics of the
test.

This fully automated workflow is particularly suited to browser-based interfaces to server-based deployments of R and data bases, and has been successfully implemented for unbiased statistical analysis of medical data sets.

A detailed description of the package the and its underlying
decision tree, can be found in the `vignette` accompanying this package.

## Implemented tests

`lm()`, `t.test()`, `wilcox.test()`, `aov()`, `kruskal.test()`,
`fisher.test()`, `chisqu.test()`

### Implemented tests to check the normal distribution of standardized residuals

`shapiro.test()` and `ad.test()`

### Implemented post-hoc tests

`TukeyHSD()` for `aov()`and `pairwise.wilcox.test()` for
`kruskal.test()`

## Installation latest stable version from CRAN

1.  Install the package `install.packages("visStatistics")`
2.  Load the package `library(visStatistics)`

## Installation of developing version from GitHub

1.  Install the devtools package from CRAN. Invoke R and type
    `install.packages("devtools")`
2.  Load the devtools package. `library(devtools)`
3.  Install the package from the github-repository
    `install_github("shhschilling/visStatistics")`
4.  Load the package `library(visStatistics)`
5.  Help on the function usage `?visstat`

## Getting Started

The package vignette allows you to get familiar with all features of
`visStatistics`. It documents in detail the algorithm of the decision
tree illustrated by below examples.

## Examples

```{r example}
library(visStatistics)
```

### Welch's t-test

#### InsectSprays data set

```{r}
insect_sprays_a_b <- 
  InsectSprays[which(InsectSprays$spray == "A" | InsectSprays$spray == "B"), ]
insect_sprays_a_b$spray <- factor(insect_sprays_a_b$spray)
visstat(insect_sprays_a_b, "count", "spray")
```

#### mtcars data set

```{r}
```

```{r}
mtcars$am <- as.factor(mtcars$am)
t_test_statistics <- visstat(mtcars, "mpg", "am")
```

### Wilcoxon rank sum test

```{r}
grades_gender <- data.frame(
  sex = as.factor(c(rep("girl", 21), rep("boy", 23))),
  grade = c(
    19.3, 18.1, 15.2, 18.3, 7.9, 6.2, 19.4,
    20.3, 9.3, 11.3, 18.2, 17.5, 10.2, 20.1, 13.3, 17.2, 15.1, 16.2, 17.0,
    16.5, 5.1, 15.3, 17.1, 14.8, 15.4, 14.4, 7.5, 15.5, 6.0, 17.4,7.3, 14.3, 
    13.5, 8.0, 19.5, 13.4, 17.9, 17.7, 16.4, 15.6, 17.3, 19.9, 4.4, 2.1
  )
)

wilcoxon_statistics <- visstat(grades_gender, "grade", "sex")
```

### ANOVA 

```{r}
insect_sprays_tr <- InsectSprays
insect_sprays_tr$count_sqrt <- sqrt(InsectSprays$count)
visstat(insect_sprays_tr, "count_sqrt", "spray")
```


### One-way test

```{r}
one_way_npk <- visstat(npk, "yield", "block")
```

### Kruskal-Wallis test

The generated graphs can be saved in all available formats of the
`Cairo` package. Here we save the graphical output of type "pdf" in the
`plotDirectory` `tempdir()`:

```{r}
visstat(iris, "Petal.Width", "Species", 
        graphicsoutput = "pdf", plotDirectory = tempdir())
```

### Linear Regression

```{r}
linreg_cars <- visstat(cars, "dist", "speed")
```

Increasing the confidence level `conf.level` from the default 0.95 to
0.99 leads two wider confidence and prediction bands:

```{r pressure, echo = FALSE}
linreg_cars_99 <- visstat(cars, "dist", "speed", conf.level = 0.99)
```

### Pearson's Chi-squared test

Count data sets are often presented as multidimensional arrays,
so-called contingency tables, whereas `visstat()` requires a
`data.frame` with a column structure. Arrays can be transformed to this
column wise structure with the helper function `counts_to_cases()`:

```{r}
hair_eye_color_df <- counts_to_cases(as.data.frame(HairEyeColor))
visstat(hair_eye_color_df, "Hair", "Eye")
```

### Fisher's exact test

```{r}
hair_eye_color_male <- HairEyeColor[, , 1]
# Slice out a 2 by 2 contingency table
black_brown_hazel_green_male <- hair_eye_color_male[1:2, 3:4]
#Transform to data frame
black_brown_hazel_green_male <- counts_to_cases(as.data.frame(black_brown_hazel_green_male))
# Fisher test
fisher_stats <- visstat(black_brown_hazel_green_male, "Hair", "Eye")
```


About

Visualization of the statistical hypothesis test between two or more groups of categorical or numerical data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published