table_data.qmd

---
title: "table data"
format: html
---

## table 类数据

此处做一个table类数据分析的记录。

table类数据集，或者说列联表，或是多个分类变量间的关系，不论是二分类还是多分类等。对其的分析都放在本笔记中。

一般的，proportion指的是分子在分母中，ratio是两个数的对比。


```{r, include=FALSE}
library(vcd)
library(vcdExtra)
library(gmodels)
```

table类数据此处主要记录，诸如Frequency and contingency tables。

```{r}
head(Arthritis)
```

### one-way table 

`xtabs`的用法可以记成`table`函数的公式化版本。右边为需要cross-classified的变量，左边为频数（如果有的话）.

one waay table 也就是讲一组变量间的计数数据，比如实验组和对照组两个分组，但是得到的结局是一个计数（或者是比例）数据，其的分析采用的方法主要是，

```{r}
# one-way table
mytable <- xtabs(~ Improved, data = Arthritis)

with(Arthritis, table(Improved))

# 求proportion
prop.table(mytable)
```

```{r}
plot(mytable)
```

```{r}
barplot(mytable)
```

```{r}
DescTools::BinomCI()
```

`prop.test` can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.
```{r}
prop.test(c(42,14,28), c(84,84,84))
```


```{r}
prop.trend.test()
```


卡方独立性检验，
```{r}
chisq.test(mytable)
```


```{r}
# Create a contingency table with vote counts for candidates A, B, and C
vote_counts <- c(42, 14, 28)
vote_table <- matrix(vote_counts, nrow = 1, byrow = TRUE)

# Name the rows and columns
rownames(vote_table) <- "Votes"
colnames(vote_table) <- c("Candidate A", "Candidate B", "Candidate C")


# Perform the chi-square test of independence
test_result <- chisq.test(vote_table)

# Print the test results
print(test_result)

```
If the p-value is less than the chosen significance level (e.g. 0.05), you would reject the null hypothesis and conclude that there is a significant difference among the proportions across the three groups (candidates).
Keep in mind that the chi-square test of independence does not provide information about which specific groups (candidates) have different proportions. If you want to test for differences between specific groups, you may conduct post-hoc pairwise comparisons using proportion tests or employ other methods like Marascuilo Procedure or adjusted standardized residuals.


*对于二分类的问题，似乎优选选择binom.test*
prop.test() uses the score test, not a Wald test。
binom.test(), as this always gives exact P-values.
```{r}
# 
binom.test(x = 44, n = 63, p = 0.8, alternative = "two.sided")

# without Yates' continuity correction
prop.test(x = x1, n = n1, p = 0.8, alternative = "two.sided", correct = FALSE)
```


### two-way table

这是最为常见的列联表的形式，常见的null假设为rows和columns是否是独立的。

```{r}
# 常见的2x2
# two-way table
mytable1_2 <- xtabs(~ Treatment + Sex, data = Arthritis)

mytable1 <- xtabs(~ Treatment + Improved, data = Arthritis)

mytable1

mytable1_2
```

```{r}
summary(mytable1)

```

```{r}
vcd::mosaic(mytable1, shade = T, legend=T)
```


```{r}
# xtabs 类
class(mytable1)

class(with(Arthritis, table(Treatment, Improved)))
```

将Contingency table转变成data.frame在R中还是简单的：
不过这种转换和原始数据的格式还是不一样的。
```{r}
as.data.frame(mytable1)

# 怎么把这种格式的dataframe变成table呢
as.data.frame(mytable1) %>% pivot_wider(id_cols = Treatment, 
                                        names_from = Improved, values_from = Freq)
```


```{r}
# 按照行求百分比
prop.table(mytable1,1)

# 求行的和
margin.table(mytable1, 1)
```

```{r}
addmargins(mytable1)
```

*一个不错的包，这个作者似乎有系列包*
得到的结果和SAS PROC FREQ类似。
```{r}
gmodels::CrossTable(Arthritis$Improved, Arthritis$Treatment)
```


fisher不只能用于2X2的列联表形式，只要是two-way的形式，其都可以。
```{r}
# ?fisher.test
mytable1 |> fisher.test()
```


*卡方检验：*
通过调整correct参数可以得出卡方值和连续调整卡方值(Yates)，对于似然比卡方检验，Likelihood Ratio (G test) for contingency tables,
可以通过`vcd::assocstats`得到。
```{r}
c1 <- chisq.test(mytable1)

c1
```
*interpret: *P值小于0.05，两个变量之间相关。


```{r}
effectsize::effectsize(c1)

```


```{r}
# 如何出现理论频数和实际频数的表格呢？
foo <- chisq.test(mytable1_2)

foo

names(foo)
```


```{r}
effectsize::effectsize(foo)

effectsize::phi(foo)
```

0 representing no association, and 1 representing a perfect association.


```{r}
vcdExtra::CMHtest(mytable1_2)
```


*association 分析(关联分析)*
此处得到的结果和LR chi的结果一致。
```{r}
vcd::assocstats(mytable1)
```
*interpret: *下面三个的值越大，说明association much stronger。


*Relative Risk:*
```{r}
DescTools::RelRisk(mytable1_2, conf.level=0.95)
```

```{r}
DescTools::OddsRatio(mytable1_2, conf.level = 0.95)
```

*Two Sample test for Proportion: *
```{r}
prop.test(mytable1_2)
```


*贝叶斯实现：*
```{r}
BFX <- contingencyTableBF(mytable1_2, sampleType = "jointMulti")

effectsize(BFX, type = "phi") # for 2 * 2
```


### three-way table 

```{r}
# 一个三个分类变量组成的列联表
# three-way table
mytable2 <- xtabs(~ Treatment + Improved + Sex, data = Arthritis)
```


```{r}
vcd::mosaic(mytable2, shade=TRUE, legend=TRUE)
```

```{r}
# print more beautiful
ftable(mytable2)

# 自然也是可以增加sum和百分比的
```


Cochran-Mantel-Haenszel test, 分层卡方检验。用于比较两个分类变量在控制一个或多个混杂变量的条件下是否存在相关性。

Mantel-Haenszel test,检验的是行和列之间是否存在线性关系（linear trend），而Pearson X2检验的是一般关联（General Association）。线形趋势一般用于行变量和列变量均为有序变量的情形。
```{r}
# 注意输入数据中，哪个变量做为stratum，一般公式中最后一个变量为stratum
# 比如本例中，sex即为stratum
mytable2 |> mantelhaen.test()
```


```{r}
vcd::assocstats(mytable2)
```