-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtable_data.qmd
287 lines (189 loc) · 6.12 KB
/
table_data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: "table data"
format: html
---
## table 类数据
此处做一个table类数据分析的记录。
table类数据集,或者说列联表,或是多个分类变量间的关系,不论是二分类还是多分类等。对其的分析都放在本笔记中。
一般的,proportion指的是分子在分母中,ratio是两个数的对比。
```{r, include=FALSE}
library(vcd)
library(vcdExtra)
library(gmodels)
```
table类数据此处主要记录,诸如Frequency and contingency tables。
```{r}
head(Arthritis)
```
### one-way table
`xtabs`的用法可以记成`table`函数的公式化版本。右边为需要cross-classified的变量,左边为频数(如果有的话).
one waay table 也就是讲一组变量间的计数数据,比如实验组和对照组两个分组,但是得到的结局是一个计数(或者是比例)数据,其的分析采用的方法主要是,
```{r}
# one-way table
mytable <- xtabs(~ Improved, data = Arthritis)
with(Arthritis, table(Improved))
# 求proportion
prop.table(mytable)
```
```{r}
plot(mytable)
```
```{r}
barplot(mytable)
```
```{r}
DescTools::BinomCI()
```
`prop.test` can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.
```{r}
prop.test(c(42,14,28), c(84,84,84))
```
```{r}
prop.trend.test()
```
卡方独立性检验,
```{r}
chisq.test(mytable)
```
```{r}
# Create a contingency table with vote counts for candidates A, B, and C
vote_counts <- c(42, 14, 28)
vote_table <- matrix(vote_counts, nrow = 1, byrow = TRUE)
# Name the rows and columns
rownames(vote_table) <- "Votes"
colnames(vote_table) <- c("Candidate A", "Candidate B", "Candidate C")
# Perform the chi-square test of independence
test_result <- chisq.test(vote_table)
# Print the test results
print(test_result)
```
If the p-value is less than the chosen significance level (e.g. 0.05), you would reject the null hypothesis and conclude that there is a significant difference among the proportions across the three groups (candidates).
Keep in mind that the chi-square test of independence does not provide information about which specific groups (candidates) have different proportions. If you want to test for differences between specific groups, you may conduct post-hoc pairwise comparisons using proportion tests or employ other methods like Marascuilo Procedure or adjusted standardized residuals.
*对于二分类的问题,似乎优选选择binom.test*
prop.test() uses the score test, not a Wald test。
binom.test(), as this always gives exact P-values.
```{r}
#
binom.test(x = 44, n = 63, p = 0.8, alternative = "two.sided")
# without Yates' continuity correction
prop.test(x = x1, n = n1, p = 0.8, alternative = "two.sided", correct = FALSE)
```
### two-way table
这是最为常见的列联表的形式,常见的null假设为rows和columns是否是独立的。
```{r}
# 常见的2x2
# two-way table
mytable1_2 <- xtabs(~ Treatment + Sex, data = Arthritis)
mytable1 <- xtabs(~ Treatment + Improved, data = Arthritis)
mytable1
mytable1_2
```
```{r}
summary(mytable1)
```
```{r}
vcd::mosaic(mytable1, shade = T, legend=T)
```
```{r}
# xtabs 类
class(mytable1)
class(with(Arthritis, table(Treatment, Improved)))
```
将Contingency table转变成data.frame在R中还是简单的:
不过这种转换和原始数据的格式还是不一样的。
```{r}
as.data.frame(mytable1)
# 怎么把这种格式的dataframe变成table呢
as.data.frame(mytable1) %>% pivot_wider(id_cols = Treatment,
names_from = Improved, values_from = Freq)
```
```{r}
# 按照行求百分比
prop.table(mytable1,1)
# 求行的和
margin.table(mytable1, 1)
```
```{r}
addmargins(mytable1)
```
*一个不错的包,这个作者似乎有系列包*
得到的结果和SAS PROC FREQ类似。
```{r}
gmodels::CrossTable(Arthritis$Improved, Arthritis$Treatment)
```
fisher不只能用于2X2的列联表形式,只要是two-way的形式,其都可以。
```{r}
# ?fisher.test
mytable1 |> fisher.test()
```
*卡方检验:*
通过调整correct参数可以得出卡方值和连续调整卡方值(Yates),对于似然比卡方检验,Likelihood Ratio (G test) for contingency tables,
可以通过`vcd::assocstats`得到。
```{r}
c1 <- chisq.test(mytable1)
c1
```
*interpret: *P值小于0.05,两个变量之间相关。
```{r}
effectsize::effectsize(c1)
```
```{r}
# 如何出现理论频数和实际频数的表格呢?
foo <- chisq.test(mytable1_2)
foo
names(foo)
```
```{r}
effectsize::effectsize(foo)
effectsize::phi(foo)
```
0 representing no association, and 1 representing a perfect association.
```{r}
vcdExtra::CMHtest(mytable1_2)
```
*association 分析(关联分析)*
此处得到的结果和LR chi的结果一致。
```{r}
vcd::assocstats(mytable1)
```
*interpret: *下面三个的值越大,说明association much stronger。
*Relative Risk:*
```{r}
DescTools::RelRisk(mytable1_2, conf.level=0.95)
```
```{r}
DescTools::OddsRatio(mytable1_2, conf.level = 0.95)
```
*Two Sample test for Proportion: *
```{r}
prop.test(mytable1_2)
```
*贝叶斯实现:*
```{r}
BFX <- contingencyTableBF(mytable1_2, sampleType = "jointMulti")
effectsize(BFX, type = "phi") # for 2 * 2
```
### three-way table
```{r}
# 一个三个分类变量组成的列联表
# three-way table
mytable2 <- xtabs(~ Treatment + Improved + Sex, data = Arthritis)
```
```{r}
vcd::mosaic(mytable2, shade=TRUE, legend=TRUE)
```
```{r}
# print more beautiful
ftable(mytable2)
# 自然也是可以增加sum和百分比的
```
Cochran-Mantel-Haenszel test, 分层卡方检验。用于比较两个分类变量在控制一个或多个混杂变量的条件下是否存在相关性。
Mantel-Haenszel test,检验的是行和列之间是否存在线性关系(linear trend),而Pearson X2检验的是一般关联(General Association)。线形趋势一般用于行变量和列变量均为有序变量的情形。
```{r}
# 注意输入数据中,哪个变量做为stratum,一般公式中最后一个变量为stratum
# 比如本例中,sex即为stratum
mytable2 |> mantelhaen.test()
```
```{r}
vcd::assocstats(mytable2)
```