-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path10-nonparametric-testing-hypothesis.Rmd
3339 lines (2483 loc) · 114 KB
/
10-nonparametric-testing-hypothesis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Nonparametric Testing Hypothesis
[book](pdf/book10.pdf){target="_blank"}
[eStat YouTube Channel](https://www.youtube.com/channel/UCw2Rzl9A4rXMcT8ue8GH3IA){target="_blank"}
**CHAPTER OBJECTIVES**
The hypothesis tests from Chapters 7 through 9 are based on assumptions
such that the populations of continuous data follow the normal
distributions. However, in real-world data, such assumptions may not be
satisfied.
This chapter introduces the nonparametric methods for testing hypothesis
by converting data such as rankings which do not require assumptions on
the population distribution.
Section 10.1 introduces tests for the location parameter of single
population such as the Sign Test and Signed Rank Test.
Section 10.2 introduces tests for comparing location parameters of two
populations such as the Wilcoxon Rank Sum Test.
Section 10.3 introduces tests for comparing location parameter of
several populations such as the Kruskal-Wallis Test and Friedman Test.
:::
:::
## Nonparametric Test for Location of Single Population
::: presentation-video-link
[presentation](pdf/1001.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/NLHCId7YoqM){.video-link target="_blank"}
:::
::: mainTable
The hypothesis test for a population mean in Chapter 7 can be done using
t distribution in the case of a small sample if the population is
assumed as a normal distribution. As such, if we make some assumptions
about a population distribution and test a population parameter using
sample data, it is called a parametric test. The hypothesis tests for
two population parameters in Chapter 8 and the analysis variance in
Chapter 9 are also parametric tests, because they assume that
populations are normal distributions.
However, real world data may not be appropriate to assume that a
population follows a normal distribution, or there may not be enough
number of samples to assume a normal distribution. In some cases, data
collected are not continuous or can be ordinal such as rank, then the
parametric tests are not appropriate. In such cases, methods to test
population parameters by converting the data into signs or ranks without
assuming on population distributions are called the distribution-free or
nonparametric tests.
Since the nonparametric test utilizes the converted data such as signs
or ranks, there may be some loss of information about the data.
Therefore, if a population can be assumed as a normal distribution,
there is no reason to use the nonparametric tests. In fact, when a
population follows a normal distribution, a nonparametric test has a
higher probability of the type 2 error at the same significance level.
However, a nonparametric test would be more appropriate if the data are
from a population that do not follow a normal distribution.
The hypothesis test for a population mean in Chapter 7 is based on the
theory of the central limit theorem for the sampling distribution of all
possible sample means. However, the nonparametric test use signs by
examining whether data values are small or large from the central
location parameter of the population (the Sign Test of 10.1.1), or use
ranks by calculating the ranking of the data (the Wilcoxon Signed Rank
Test of Section 10.1.2). Here, the central location parameter can be the
population mean or the population median, but usually referring to the
population median that is not affected by an extreme point of the data.
Estimation of a population parameter can also be made by using a
nonparametric method, but this chapter only introduces nonparametric
hypothesis tests. Those interested in the nonparametric estimation
should refer to the relevant literature.
:::
### Sign Test
::: mainTable
Let's take a look at the sign test with the following examples.
:::
::: mainTableGrey
**Example 10.1.1** A bag of cookies is marked with a weight of 200g. Ten
bags are randomly selected from several retailers and examined their
weights as follows. Can you say that there are as many cookies in the
bag as the weight marked?
::: textLeft
203 204 197 195 201 205 198 199 194 207
:::
::: textLeft
Ex ⇨ eBook ⇨ EX100101_CookieWeight.csv
:::
::: textL20M20
1\) Draw a histogram of the data to check whether a testing hypothesis
using a parametric method can be performed.
:::
::: textL20M20
2\) Test the hypothesis by using a nonparametric method which utilizes
the sign data by examining whether data values are smaller or larger
than 200 with the significance level of 5%.
:::
::: textL20M20
3\) Check the result of the above test using『eStatU』.
:::
**Answer**
::: textL20M20
1\) The null and alternative hypothesis to test the population mean can
be written as follows:
:::
::: textL20
$\quad \small \quad H_0 : \mu = 200 ,\;\; H_1 : \mu \ne 200$:
In order to test the hypothesis using the parametric t-test in Chapter
7, it is necessary to assume that the population is normally
distributed, because the sample size of 10 is small. Let us check
whether the sample data is a normal distribution by using a histogram.
Enter data in『eStat』 as shown in [Figure 10.1.1]{.figure-ref}
:::
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[30])" src="QR/EX100101.svg" type="image"/>
</div>
<div>
![](Figure/Fig100101.png){.imgFig600400}
::: figText
[Figure 10.1.1]{.figure-ref} Data input for cookie weight
:::
</div>
</div>
::: textL20
Click icon of the testing hypothesis for the population mean and select
'Weight' as the analysis variable in the variable selection box. A dot
graph with the 95% confidence interval will appear as [Figure 10.1.2]{.figure-ref}.
If you click the \[Histogram\] button in the options window below the
graph, a histogram as shown in [Figure 10.1.3]{.figure-ref} will appear. If you
look at the histogram, it is not sufficient to assume that the
population follows a normal distribution. In such cases, applying the
parametric hypothesis test may lead to errors.
:::
![](Figure/Fig100102.svg){.imgFig600400}
::: figText
[Figure 10.1.2]{.figure-ref} Dot graph of the cookie weight
:::
![](Figure/Fig100103.svg){.imgFig600400}
::: figText
[Figure 10.1.3]{.figure-ref} Histogram of the cookie weight
:::
::: textL20M20
2\) In this case, the sample data can be converted to sign data only by
examining whether the weight of cookie bag is greater than 200g (+
marked) or not (- marked).
:::
sample data sign data
------------- -----------
203 \+
204 \+
197 \-
195 \-
201 \+
205 \+
198 \-
199 \-
194 \-
207 \+
::: textL20
If the number of + signs and -- signs are similar, the weight of cookie
bag would be 200g approximately. If the number of + signs is larger than
-- signs, then the weight of cookie bag is greater than 200g. If the
number of -- signs is larger than + signs, then the weight of cookie bag
is less than 200g.
:::
::: textL20
Since the above sign data only investigate whether a data is larger and
smaller than 200 and never use a concept of the mean, it can be
considered as testing for the population median ($\small M$) as follows:
$\quad \small \quad H_0 : M = 200 ,\;\; H_1 : M \ne 200$:
In the sign data above, 'the number of + signs' (denote it as $n_+$) or
the number of -- signs' (denote as $n_+$) follows a binomial
distribution with parameters of $n$=10, $p$=0.5 ([Figure 10.1.4]{.figure-ref}).
:::
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[82])" src="QR/eStatU420_Binomial.svg" type="image"/>
</div>
<div>
![](Figure/Fig100104.svg){.imgFig600400}
::: figText
[Figure 10.1.4]{.figure-ref} Binomial distribution when =10, =0.5
:::
</div>
</div>
::: textL20
Therefore, if $\small H_0$ is correct, the number of + signs may be the
most likely to be 5 and 0, 1 or 9, 10 are very unlikely to be present.
In order to test $\small H_0 : M$ = 200 with 5% significance level,
since it is a two-sided test, rejection region should have the 2.5%
probability at both ends of the binomial distribution, so it is
approximately as follows:
:::
::: textL30
'If the number of + signs ($n_+$) is either 0, 1 (cumulated probability
from left is 0.011) or 9, 10 (cumulated probability from right is
0.011), then reject $\small H_0$.'
:::
::: textL20
This rejection region has a total probability of 2\*0.011 = 0.022 which
is smaller than the significance level of 0.05. When we use a discrete
distribution such as binomial distribution, it may be difficult to find
a rejection region which is exactly the same as the significance level.
If we include one more values in the rejection region, the decision rule
is as follows:
:::
::: textL30
'If the number of + signs ($n_+$) is either 0, 1, 2 (cumulated
probability from left is 0.055) or 8, 9, 10 (cumulated probability from
right is 0.055), then reject $\small H_0$.'
:::
::: textL20
This rejection region has a total probability of 2\*0.055 = 0.110 which
is greater than the significance leve of 0.05. Therefore, the middle
values 1.5 (of 1 and 2) and 8.5 (of 8 and 9) can be used in the decision
rule as follows:
:::
::: textL30
'If the number of + signs $n_+$ \< 1.5 or $n_+$ \> 8.5, then reject
$\small H_0$.'
:::
::: textL20
This method may also be approximate. In the case of testing using a
discrete distribution, it is not possible to say 'what is right' among
the above decision rules and the analyst should select the critical
value near the significance level. In this example, the number of +
signs ($n_+$) is 5 and you can not reject the null hypothesis. In other
words, the median of the weight of the cookie bag is 200g.
:::
::: textL20M20
3\) Enter data as shown in [Figure 10.1.5]{.figure-ref} in 『eStatU』and press the
[Execute]{.button-ref} button to show the test result as in [Figure 10.1.6]{.figure-ref}. It
shows the critical lines for values containing the significance level of
5% (2.5% for both tests). For a discrete distribution such as the
binomial distribution, the choice of the final reject region shall be
determined by the analyst.
:::
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[112])" src="QR/eStatU940_TestSign.svg" type="image"/>
</div>
<div>
![](Figure/Fig100105.png){.imgFig600400}
::: figText
[Figure 10.1.5]{.figure-ref} Data input for sign test in 『eStatU』
:::
</div>
</div>
![](Figure/Fig100106.svg){.imgFig600400}
::: figText
[Figure 10.1.6]{.figure-ref} Result of sign test using『eStatU』
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[112])" src="QR/eStatU940_TestSign.svg" type="image"/>
</div>
<div>
**Practice 10.1.1** A psychologist has selected 9 handicap workers
randomly from production workers employed at various factories in a
large industrial complex and their work competency scores are examined
as follows. The psychologist wants to test whether the population median
score is 40. Assume the population distribution is symmetrical about the
mean.
::: textLeft
32, 52, 21, 39, 23, 55, 36, 27, 37
:::
::: textLeft
Ex ⇨ eBook ⇨ PR100101_CompetencyScore.csv
:::
::: textL20M20
1\) Check whether a parametric test is possible.
:::
::: textL20M20
2\) Apply the sign test with the significance level of 5%.
:::
</div>
</div>
:::
::: mainTable
When the population median is $M$, the sign test is to test whether
$M = M_0$ or $M \gt M_0$ (or $M \lt M_0$, or $M \ne M_0$). However, if
the population distribution is symmetrical to the mean, the sign test is
the same as the test of the population mean, because mean and median are
the same in this case.
When there are $n$ number of samples, the test statistic for the sign
test uses the number of data which are greater than $M_0$ ($n_+$). The
sign test uses the random variable of 'the number of + signs ($n_+$)'
which follows a binomial distribution with parameters $n$ and $p$=0,5,
I.e., $B(n,0.5)$ when the null hypothesis is true. You can use the
number of data which are less than $M_0$( $n_- = n - n_+$ ), also follow
a binomial distribution. Let us use $n_+$ in this section.
$B(n,0.5)_{\alpha}$ represents the 100$\times \alpha$ right 100
percentile, but the accurate percentile value may not exist, because it
is a discrete distribution. In this case, middle value of two nearest
percentile is often used. [Table 10.1.1]{.table-ref} summarizes the decision rule for
each type of hypothesis of the sign test.
:::
::: textLeft
Table 10.1.1 Decision rule of the sign test
:::
--------------------------------------------------------------------------------------------------------
\ Decision Rule\
Type of Hypothesis Test Statistic $n_{+}$= 'number of plus sign data'
----------------------------------- --------------------------------------------------------------------
1\) $\; H_0 : M = M_0$\ If $n_{+} > B(n, 0.5)_{α}$, then reject $H_0$
$\quad\,\, H_1 : M > M_0$
2\) $\; H_0 : M = M_0$\ If $n_{+} < B(n, 0.5)_{1-α}$, then reject $H_0$
$\quad\,\, H_1 : M < M_0$
3\) $\; H_0 : M = M_0$\ If
$\quad\,\, H_1 : M \ne M_0$ $n_{+} < B(n, 0.5)_{1-α/2} \quad or\quad n_{+} > B(n, 0.5)_{α/2}$,
then reject $H_0$
--------------------------------------------------------------------------------------------------------
::: mainTableYellow
**☞ If the observed value is the same as $M_0$?**
If any of the observations has the same value as $M_0$, they are not
used in the sign test. In other words, reduce $n$.
:::
::: mainTable
As studied in Chapter 5, the binomial distribution $B(n,0.5)$ is
approximated to the normal distribution $N(0.5n,0.5^2 n)$ if $n$ is
sufficiently large. Therefore, if the sample size is large, the test
statistic $n_+$ = 'the number of plus sign data' can be tested using
the normal distribution $N(0.5n,0.5^2 n)$. [Table 10.1.2]{.table-ref} summarizes the
decision rule for each hypothesis of the sign test in the case of large
samples.
:::
::: textLeft
Table 10.1.2 Decision rule of the sign test (large sample case)
:::
-----------------------------------------------------------------------------------------------------------------------------------------
Type of Hypothesis Decision Rule\
Test Statistic $N_{+}$= 'number of plus sign data'
----------------------------------- -----------------------------------------------------------------------------------------------------
1\) $\; H_0 : M = M_0$\ If $\frac{n_{+} -0.5n}{\sqrt{0.25n}} > z_{α}$, then reject $H_0$
$\quad\,\, H_1 : M > M_0$
2\) $\; H_0 : M = M_0$\ If $\frac{n_{+} -0.5n}{\sqrt{0.25n}} < z_{1-α}$, then reject $H_0$
$\quad\,\, H_1 : M < M_0$
3\) $\; H_0 : M = M_0$\ If
$\quad\,\, H_1 : M \ne M_0$ $\left | \frac{n_{+} -0.5n}{\sqrt{0.25n}} \right| < z_{α/2} \quad or\quad n_{+} > B(n, 0.5)_{α/2}$,
then reject $H_0$
-----------------------------------------------------------------------------------------------------------------------------------------
:::
:::
### Wilcoxon Signed Rank Sum Test
::: presentation-video-link
[presentation](pdf/100102.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/2pMOLvDTth8){.video-link target="_blank"}
:::
::: mainTable
The sign test described in the previous section converted sample data to
either + or - symbols by examining whether the data were larger or
smaller than the medium $M_0$. In this case, most of the information
that the original sample data have is lost. In order to apply the
Wilcoxon signed rank test, we subtract $M_0$ first from the sample data
and take the absolute value of this data. Assign ranks on these absolute
values and calculate the sum of the larger ranks than $M_0$ and the sum
of the smaller ranks than $M_0$. If two rank sums are similar, we
conclude that the population median is equal to $M_0$. This signed rank
sum test is the most widely used nonparametric method for testing the
central location parameter of a population. This test takes into account
the relative size of the sample data as well as the larger and smaller
than $M_0$.
:::
::: mainTableGrey
**Example 10.1.2** Using the cookie weight data of [Example 10.1.1]{.example-ref},
apply the signed rank test to see whether the weight of the cookie bag
is 200g or not with the significance level of 5%
::: textLeft
203 204 197 195 201 205 198 199 194 207
:::
::: textLeft
Ex ⇨ eBook ⇨ EX100101_CookieWeight.csv
:::
Check the result of the signed rank test using『eStatU』.
**Answer**
The hypothesis for this problem is to test whether the population
median() is 200g or not.
::: textL20
$\quad \small H_0 : M = 200, \quad \quad H_1 : M \ne 200$
:::
The signed rank sum test examines not only checking the sample data are
greater than $\small M_0$ = 200g (+ sign) or not (- sign), but also
checking the rank of values of \|data -- 200\|. If there are tied
values, assign the average rank to each of tied values. For example,
since there are two tied values of '1' which is the smallest among
\|data -- 200\|, the corresponding ranks of 1 and 2 are averaged which
is 1.5 and assign the averaged rank to each of value '1'.
-------------------------------------------------------------------------- --------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
Sample data 203 204 197 195 201 205 198 199 194 207
Sign data \+ \+ \- \- \+ \+ \- \- \- \+
\|data -- 200\| 3 4 3 5 1 5 2 1 6 7
Rank of \|data -- 200\| 4.5 6 4.5 7.5 1.5 7.5 3 1.5 9 10
Rank sum of ['+']{style="color:red"} sign ([$R_{+}$]{style="color:red"}) 4.5 + 6 + 1.5 + 7.5 + 10 = 29.5
-------------------------------------------------------------------------- --------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
The sum of all ranks is 1 + 2 + $\cdots$ + 10 = $\frac{10(10+1)}{2}$ =
55. If the rank sum of + sign data ([$\small R_{+}$]{style="color:red"})
and the rank sum of -- sign data ($\small R_-$) are similar
(approximately 27.5 or so), the null hypothesis $\small H_0 : M$ = 200g
would be true. In this example, [$\small R_{+}$]{style="color:red"} =
29.5 and $\small R_-$ = 25.5. Since [$\small R_{+}$]{style="color:red"}
is greater than $\small R_{-}$, the weight data which are greater than
200g appears to be dominant. What kind of large difference is
statistically significant?
To investigate how large a value is statistically significant when the
null hypothesis is true, the distribution of random variable
[$\small R_{+}$]{style="color:red"} = 'rank sum of + sign data' (or
$\small R_-$ = 'rank sum of -- sign data') should be known. If
$\small H_0$ is true, the number of cases for
[$\small R_{+}$]{style="color:red"} is shown in [Table 10.1.3]{.table-ref}. It is not
easy to examine all of these possible rankings to create a distribution
table. 『eStatU』shows the distribution of Wilcoxon signed rank sum as
shown in [Figure 10.1.7]{.figure-ref} and its table as in [Table 10.1.4]{.table-ref}.
::: textLeft
Table 10.1.3 All possible cases of [$\small R_{+}$]{style="color:red"} =
'rank sum of + sign data'
:::
------------------------------------------------------------------------------
Number of data with + All possible All possible rank sum of
sign combination of ranks [$R_{+}$]{style="color:red"}
----------------------- ----------------------- ------------------------------
0 0 0
1 {1}, {2}, \... , {10} 1, 2, \... , 10
2 {1,2}, {1,3}, \... , 3, 4, \... , 11,\
{1,10},\ 5, \... , 12,\
{2,3}, \... , {2,10},\ $\cdots$\
$\cdots$\ 19
{9,10}
$\cdots$ $\cdots$ $\cdots$
10 {1,2, \... ,10} 55
------------------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[113])" src="QR/eStatU95D_TestSignedRankD.svg" type="image"/>
</div>
<div>
![](Figure/Fig100107.svg){.imgFig600400}
::: figText
[Figure 10.1.7]{.figure-ref} Distribution of Wilcoxon signed rank sum when $n$=10
:::
</div>
</div>
::: textLeft
Table 10.1.4 Distribution of Wilcoxon signed rank sum when $n$ = 10
:::
Wilcoxon Signed Rank Sum Distribution n = 10
--------------------------------------- ------------ -------------- --------------
$x$ $P(X = x)$ $P(X \le x)$ $P(X \ge x)$
0 0.0010 0.0010 1.0000
1 0.0010 0.0020 0.9990
2 0.0010 0.0029 0.9980
3 0.0020 0.0049 0.9971
4 0.0020 0.0068 0.9951
5 0.0029 0.0098 0.9932
6 0.0039 0.0137 0.9902
7 0.0049 0.0186 0.9863
8 0.0059 0.0244 0.9814
9 0.0078 0.0322 0.9756
$\cdots$ $\cdots$ $\cdots$ $\cdots$
47 0.0059 0.9814 0.0244
48 0.0049 0.9863 0.0186
49 0.0039 0.9902 0.0137
50 0.0029 0.9932 0.0098
51 0.0020 0.9951 0.0068
52 0.0020 0.9971 0.0049
53 0.0010 0.9980 0.0029
54 0.0010 0.9990 0.0020
55 0.0010 1.0000 0.0010
Since it is a two-sided test with the 5% significance level, if you find
a 2.5% percentile at both ends, $P(X \le 8)$ = 0.0244, $P(X \ge 47)$ =
0.0244. In case of a discrete distribution, we cannot find the exact 2.5
percentile from both ends. Therefore, the decision rule can be written
as follows:
::: textL20
'If $\small R_+ \le$ 8.5 or $\small R_+ \ge$ 46.5, then reject
$\small H_0$'
:::
Since $\small R_+$ = 29.5 in this problem, we can not reject
$\small H_0$.
After entering the data in『eStatU』as in [Figure 10.1.8]{.figure-ref}, pressing
the [Execute]{.button-ref} button will calculate the sample statistics and show the
test result as in [Figure 10.1.9]{.figure-ref}. The critical lines are the value
for containing 5% significance level from both sides (the probability of
each end is 2.5%). For a discrete distribution, the choice of the final
reject region should be determined by the analyst.
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[114])" src="QR/eStatU950_TestSignedRank.svg" type="image"/>
</div>
<div>
![](Figure/Fig100108.png){.imgFig600400}
::: figText
[Figure 10.1.8]{.figure-ref} 『eStatU』Signed rank sum test
:::
</div>
</div>
![](Figure/Fig100109.png){.imgFig600400}
::: figText
[Figure 10.1.9]{.figure-ref} Signed rank sum test in 『eStatU』
:::
The signed rank sum test can be done using 『eStat』. If you enter the
data as shown in [Figure 10.1.10]{.figure-ref}, select 'Weight' as the analysis
variable in the variable selection box and click the icon of testing the
population mean. Then a dot graph with the 95% confidence interval for
the population mean will appear as [Figure 10.1.11]{.figure-ref}.
![](Figure/Fig100110.png){.imgFig600400}
::: figText
[Figure 10.1.10]{.figure-ref} Data input for cookie weight
:::
![](Figure/Fig100111.svg){.imgFig600400}
::: figText
[Figure 10.1.11]{.figure-ref} Dot graph and confidence interval of cookie weight
:::
Enter a value of 200 from the options below the graph and click the
\[Wilcoxon Signed Rank Sum Test\] button to display the same test result
graph and result table as in [Figure 10.1.12]{.figure-ref}.
![](Figure/Fig100112.png){.imgFig600400}
::: figText
[Figure 10.1.12]{.figure-ref} Result of the Wilicoxon Signed Rank Sum Test
:::
:::
::: mainTable
If we denote the population median as $M$, the signed rank sum test is
to test whether the population median is $M_0$ or greater than (or less
than or not equal to) . However, if the population distribution is
symmetric about the mean, the signed rank sum test becomes to test about
the population mean, because the population median and mean are the
same. The basic statistical model is as follows: $$
X_i = M_0 + \epsilon_{i}, \quad i=1,2,...,n
$$ where $\epsilon_i$'s are independent, symmetric about the mean
0 and follow the same distribution.
If $x_1 , x_2 , ... , x_n$ are sample data, ranks of $|x_i - M_0|$ are
calculated first and the sum of ranks for the data which are greater
than $M_0$ (+ sign data of $x_1 , x_2 , ... , x_n$), denoted as $R_+$,
is calculated. $R_+$ is the test statistic for the signed rank sum test
and the sampling distribution of $R_+$, denoted as $w_{+}(n)$, is
calculated for testing hypothesis by considering all possible
cases.『eStatU』provides $w_{+}(n)$ until $n$ = 22. $w_{+}(n)_{α}$
denotes right 100$\times α$ percentile of the $w_{+}(n)$ distribution,
but it is not easy to find the exact percentile, because $w_{+}(n)$ is a
discrete distribution and is usually used to approximate the two
adjacent values. [Table 10.1.5]{.table-ref} summarizes the decision rule for the
Wilcoxon signed rank sum test for each type of hypothesis.
:::
::: textLeft
Table 10.1.5 Decision rule of Wilcoxon signed rank sum test
:::
------------------------------------------------------------------------------------------------------
\ Decision Rule\
Type of Hypothesis Test Statistic $R_{+}$= 'Rank sum of + sign data of
$|x_{i} – M_{0} |$
----------------------------------- ------------------------------------------------------------------
1\) $\; H_0 : M = M_0$\ If $R_{+} > w_{+}(n)_{α}$, then reject $H_0$
$\quad\,\, H_1 : M > M_0$
2\) $\; H_0 : M = M_0$\ If $R_{+} < w_{+}(n)_{1-α}$, then reject $H_0$
$\quad\,\, H_1 : M < M_0$
3\) $\; H_0 : M = M_0$\ If
$\quad\,\, H_1 : M \ne M_0$ $R_{+} < w_{+}(n)_{1-α/2} \quad or\quad R_{+} > w_{+}(n)_{α/2}$,
then reject $H_0$
------------------------------------------------------------------------------------------------------
::: mainTableYellow
**☞ If the observed value is the same as $M_0$?**
If any of the observed values has the same value as , they are not used
in the test. In other words, reduce .
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[114])" src="QR/eStatU950_TestSignedRank.svg" type="image"/>
</div>
<div>
**Practice 10.1.2** A psychologist has selected 9 handicap workers
randomly from production workers employed at various factories in a
large industrial complex and their work competency scores are examined
as follows. The psychologist wants to test whether the population median
score is 45. Assume the population distribution is symmetrical about the
mean.
::: textLeft
32, 52, 21, 39, 23, 55, 36, 27, 37
:::
::: textLeft
Ex ⇨ eBook ⇨ PR100101_CompetencyScore.csv
:::
::: textL20M20
1\) Check whether a parametric test is possible.
:::
::: textL20M20
2\) Apply the Wilcoxon signed rank test with the significance level of
5%.
:::
::: textL20M20
) Compare this test result with the sign test of \[Practice 10.1.1\].
:::
</div>
</div>
:::
::: mainTable
If the sample size is large enough, the test statistic $R_+$ is
approximated to a normal distribution with the following mean $E(R_{+})$
and variance $V(R_{+})$ when the null hypothesis is true. $$
\begin{align}
E(R_+ ) &= \frac {n(n+1)}{4} \\
V(R_+ ) &= \frac {n(n+1)(2n+1)} {24}
\end{align}
$$
Table 10.1.6 summarizes the decision rule of the signed rank sum test
for each type of hypothesis.
:::
::: textLeft
Table 10.1.6 Decision rule of Wilcoxon signed rank sum test (large
sample case)
:::
-----------------------------------------------------------------------------------------------------------
Type of Hypothesis Decision Rule\
Test Statistic $R_{+}$= 'Rank sum of + sign data of $|x_{i} – M_{0} |$
----------------------------------- -----------------------------------------------------------------------
1\) $\; H_0 : M = M_0$\ If $\frac{R_{+} - E(R_{+})}{\sqrt{V(R_{+})}} > z_{α}$, then reject
$\quad\,\, H_1 : M > M_0$ $H_0$
2\) $\; H_0 : M = M_0$\ If $\frac{R_{+} - E(R_{+})}{\sqrt{V(R_{+}}} < z_{1-α}$, then reject
$\quad\,\, H_1 : M < M_0$ $H_0$
3\) $\; H_0 : M = M_0$\ If
$\quad\,\, H_1 : M \ne M_0$ $\left | \frac{R_{+} - E(R_{+})}{\sqrt{V(R_{+}}} \right | > z_{α/2}$,
then reject $H_0$
-----------------------------------------------------------------------------------------------------------
::: mainTable
The distribution of $w_{+}(n)$ is independent of the population
distribution. In other words, the Wilcoxon signed rank sum test is a
distribution free test. For example, if $n$ = 3, the distribution of
$w_{+}(3)$ can be obtained as follows:
:::
Rank 1 Rank 2 Rank 3 Possible value of $R_{+}$
-------- -------- -------- ---------------------------
\- \- \- 0
\+ \- \- 1
\- \+ \- 2
\- \- \+ 3
\+ \+ \- 3
\+ \- \+ 4
\- \+ \+ 5
\+ \+ \+ 6
::: mainTable
Therefore, the distribution of $w_{+}(3)$ can be calculated as follows
which is independent of the population distribution.
:::
$R_{+} = x$ $P(R_{+} = x)$
------------- ----------------
0 $\frac{1}{8}$
1 $\frac{1}{8}$
2 $\frac{1}{8}$
3 $\frac{2}{8}$
4 $\frac{1}{8}$
5 $\frac{1}{8}$
6 $\frac{1}{8}$
::: mainTable
If there is a tie on the value of $|x_i - M_0|$, the average rank is
calculated when the ranking is obtained. In this case, the variance of
$R_+$ in case of large sample is calculated using the following modified
formula. $$
V(R_+ ) = \frac{1}{24 } [n(n+1)(2n+1) - \frac{1}{2} \sum_{j=1}^{g} t_{j}(t_{j}-1)({t}_{j}+1) ]
$$ Here $g$ = (number of groups of tie), $t_j$ = (size of $j^{th}$
tie group, i.e., number of observations in the tie group). if there is
no tie, size of $j^{th}$ tie group is 1 and $t_j$ = 1.
:::
::: mainTablePink
### Multiple Choice Exercise
Choose one answer and click Submit button
::: textL30M30
10.1 What is NOT the reason to have a nonparametric test?
:::
<form name="Q1">
<label><input name="item" type="radio" value="1"/> Population is not normally distributed.</label><br/>
<label><input name="item" type="radio" value="2"/> Ordinal data. </label><br/>
<label><input name="item" type="radio" value="3"/> Data follows a normal distribution.</label><br/>
<label><input name="item" type="radio" value="4"/> There is an extreme point in sample. </label><br/>
<p>
<input onclick="radio(10,1,Q1)" type="button" value="Submit"/>
<input id="ansQ1" size="15" type="text"/>
</p></form>
::: textL30M30
10.2 Which of the following nonparametric tests is for testing the
location parameter of single population?
:::
<form name="Q2">
<label><input name="item" type="radio" value="1"/> Wilcoxon signed rank sum test</label><br/>
<label><input name="item" type="radio" value="2"/> Wilcoxon rank sum test</label><br/>
<label><input name="item" type="radio" value="3"/> Kruskal-Wallis test</label><br/>
<label><input name="item" type="radio" value="4"/> Friedman test</label><br/>
<p>
<input onclick="radio(10,2,Q2)" type="button" value="Submit"/>
<input id="ansQ2" size="15" type="text"/>
</p></form>
::: textL30M30
10.3 What is the sign test?
:::
<form name="Q3">
<label><input name="item" type="radio" value="1"/> Test for the location parameter of single population</label><br/>
<label><input name="item" type="radio" value="2"/> Test for two location parameters of two populations</label><br/>
<label><input name="item" type="radio" value="3"/> Test for several location parameters of multiple populations</label><br/>
<label><input name="item" type="radio" value="4"/> Test for the randomized block design</label><br/>
<p>
<input onclick="radio(10,3,Q3)" type="button" value="Submit"/>
<input id="ansQ3" size="15" type="text"/>
</p></form>
::: textL30M30
10.4 What is the transformation of data that is often used for
nonparametric tests?
:::
<form name="Q4">
<label><input name="item" type="radio" value="1"/> log transformation</label><br/>
<label><input name="item" type="radio" value="2"/> exponential transformation</label><br/>
<label><input name="item" type="radio" value="3"/> (0-1) transformation</label><br/>
<label><input name="item" type="radio" value="4"/> ranking transformation</label><br/>
<p>
<input onclick="radio(10,4,Q4)" type="button" value="Submit"/>
<input id="ansQ4" size="15" type="text"/>
</p></form>
::: textL30M30
10.5 What is the test statistic used for the sign test?
:::
<form name="Q5">
<label><input name="item" type="radio" value="1"/> rank</label><br/>
<label><input name="item" type="radio" value="2"/> (number of + signs) - (number of - signs)</label><br/>
<label><input name="item" type="radio" value="3"/> degrees of freedom</label><br/>
<label><input name="item" type="radio" value="4"/> (number of + signs)</label><br/>
<p>
<input onclick="radio(10,5,Q5)" type="button" value="Submit"/>
<input id="ansQ5" size="15" type="text"/>
</p></form>
:::
:::
:::
## Nonparametric Test for Comparing Locations of Two Populations
::: presentation-video-link
[presentation](pdf/1002.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/_w0ebMxObrg){.video-link target="_blank"}
:::
::: mainTable
The testing hypothesis for the two population means in Chapter 8 used
the t-distribution in case of a small sample, if each population could
be assumed to be a normal distribution. However, the assumption that the
population follows a normal distribution may not be appropriate for real
world data, or that there may not be enough sample data to assume a
normal distribution. Alternatively, if data collected may be ordinal
such as ranking, then the parametric t-test is not appropriate. In such
cases, a nonparametric method is used to test parameters by converting
data to ranks without assuming the distribution of the population. This
section introduces the Wilcoxon rank sum test.
Nonparametric tests convert data into ranks, so there may be some loss
of information about the data. Therefore, if data are normally
distributed, there is no reason to apply a nonparametric test. However,
a nonparametric method would be a more appropriate method if the data do
not follow a normal distribution.
As in Chapter 8, this section introduces nonparametric tests for testing
location parameters of two populations for the samples drawn
independently from each population and for the samples drawn as paired.
:::
### Independent Samples: Wilcoxon Rank Sum Test