-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-data-summary-using-tables-and-measures.Rmd
1851 lines (1335 loc) · 54.9 KB
/
04-data-summary-using-tables-and-measures.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Data Summary Using Tables and Measures
[book](pdf/book04.pdf){target="_blank"}
[eStat YouTube Channel](https://www.youtube.com/channel/UCw2Rzl9A4rXMcT8ue8GH3IA){target="_blank"}
**CHAPTER OBJECTIVES**
Chapter 2 and 3 discussed how to visualize both the qualitative data and
the quantitative data using graphs. Visualizing data using graphs makes
easy and fast to see any information that is nested in data. However, if
you want more detailed information, it is better to summarize data by
using tables or measures.
In section 4.1, we introduce a frequency table as a summary of single
variable.
In section 4.2, we introduce a contingency table as a summary of two
variables.
In section 4.3, we introduced measures to summarize the quantitative
data and a box plot.
:::
:::
## Frequency Table for Single Variable
::: presentation-video-link
[presentation](pdf/0401.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/R761krGIxBM){.video-link target="_blank"}
:::
::: mainTable
![](Icon/eStat_icon18_freq.png){.imgIcon}\
A frequency table of qualitative data summarizes frequencies of each
possible value of a categorical variable. A frequency table is the most
commonly used tool to summarize qualitative data. The frequency table
also shows relative frequencies (percents) which are calculated by
dividing the frequency of each category with the number of observations
belong to the category, and cumulative relative frequencies accumulated
in the order of the categories. The bar graph, the pie chart and the
band graph in Chapter 2 are drawn by using this frequency table of
qualitative data. The frequency table is usually used to summarize
qualitative data, but it can also be used to summarize quantitative data
by transforming it to qualitative data. All possible values of the
quantitative data are divided into several intervals which are not
overlapped with each other and the number of observations belong to each
interval is counted to make a frequency table.
:::
::: mainTableYellow
**Frequency Table**
**Frequency table** of qualitative data summarizes frequencies of each
possible value of a categorical variable.
The frequency table can also be used to summarize quantitative data by
transforming it to qualitative data. All possible values of the
quantitative data are divided into several intervals which are not
overlapped with each other and the number of observations belong to each
interval is counted to make a frequency table.
:::
::: mainTable
A frequency table of sample data can be used to test the goodness of fit
of data whether data follow a particular distribution as described in
Chapter 11.
:::
### Frequency Table for Categorical Variable
::: mainTableGrey
**Example 4.1.1** **(Gender Raw Data)**
In Example 2.3.1, a bar graph of the gender variable in a class was
drawn by using the raw data shown in [Table 4.1.1]{.table-ref}. The bar graph was able
to be drawn by using the frequencies of male and female students. Use
『eStat』 to create a frequency table for this raw data of the gender
variable.
Table 4.1.1 Gender raw data
Gender
--------
1
2
1
2
1
1
1
2
1
2
Ex ⇨ eBook ⇨ EX040101_Categorical_Gender.csv.
**Answer**
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[18])" src="QR/EX040101.svg" type="image"/>
</div>
<div>
Enter the gender data of [Table 4.1.1]{.table-ref} to 『eStat』 as in \<Figure
4.1.1\>. Use [Edit Var]{.button-ref} button to enter the variable name 'Gender' and
its value labels as 1 for 'Male' and 2 for 'Female' as in \<Figure
4.1.2\>. The data that were edited for their value labels must be saved
in JSON format to ensure that the entered information is not lost. When
you load a file in JSON format, you must also use the JSON Open icon
which is for opening a file in JSON format.
![](Figure/Fig040101.png){.imgFig300400}
::: figText
[Figure 4.1.1]{.figure-ref} Input gender data of a class
:::
</div>
</div>
![](Figure/Fig040102.png){.imgFig300400}
::: figText
[Figure 4.1.2]{.figure-ref} Input variable name and value label
:::
If you select the gender variable as the 'Analysis Var' in the
variable selection box as shown in [Figure 4.1.1]{.figure-ref}, a bar graph of the
gender is drawn as in [Figure 4.1.3]{.figure-ref}. Then, if you click the Frequency
Table icon, the frequency table of the gender variable will appear in
the Log Area, as in [Figure 4.1.4]{.figure-ref}. This frequency table is used to
draw the bar graph or the pie chart.
![](Figure/Fig040103.svg){.imgFig600540}
::: figText
[Figure 4.1.3]{.figure-ref} Bar graph of the gender
:::
![](Figure/Fig040104.png){.imgFig300200}
::: figText
[Figure 4.1.4]{.figure-ref} Frequency table of the gender
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[56])" src="QR/PR040101.svg" type="image"/>
</div>
<div>
**Practice 4.1.1** **(Vegetable Preference)**\
Data that examined gender (1: male, 2: female) and vegetable
preference(1: lettuce, 2: spinach, 3: pumpkin, 4: eggplant) of an
elementary school class can be found at the following location of
『eStat』.
Ex ⇨ eBook ⇨ PR040101_Categorical_VegetablePrefByGender.csv.
By using 『eStat』 , find a frequency table of the vegetable preference.
</div>
</div>
:::
### Frequency Table for Quantitative Variable
::: mainTable
The quantitative data can have too many possible values and a frequency
table of the quantitative data may not be easy to analyze. In order to
make a frequency table for quantitative data which can be analyzed
easily, possible values of the data are divided into several intervals
and frequencies of each interval are investigated. Generally, the
intervals are not overlapped with each other and the number of data in
each interval is counted. For this purpose, the maximum and the minimum
of data are first investigated to calculate the range of the data and
then determine the number of intervals. The number of intervals is
typically between 5 and 10, but it may depend on a researcher's choice.
Some researchers prefer to use the square root of the number of
observations. If the number of intervals is determined, the range of
data (maximum - maximum) is divided by the number of intervals to
calculate the width of the interval. Starting and ending points of each
interval are usually described as 'from greater than or equal (≥)
$a$'to less than (\<) $b$ ' which means a one-sided closed interval
\[$a$ ,$b$ ).
:::
::: mainTableGrey
**Example 4.1.2** **(Otter length)**
Data of 30 otter lengths can be found at the following location of
『eStat』.
Ex ⇨ eBook ⇨ EX040120_Continuous_OtterLength.csv.
Draw a histogram and frequency table of the otter lengths by using
『eStat』.
**Answer**
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[19])" src="QR/EX040102.svg" type="image"/>
</div>
<div>
Retrieve the data from 『eStat』 as in [Figure 4.1.5]{.figure-ref}.
![](Figure/Fig040105.png){.imgFig300400}
::: figText
[Figure 4.1.5]{.figure-ref} Data of Otter Length
:::
</div>
</div>
Click the Histogram Icon and then select the variable name
'OtterLength' to draw a histogram as shown in [Figure 4.1.6]{.figure-ref}.
![](Figure/Fig040106.svg){.imgFig600540}
::: figText
[Figure 4.1.6]{.figure-ref} Histogram of the otter length
:::
Click on the \[Frequency Table\] button in the options window below the
histogram ([Figure 4.1.7]{.figure-ref}). Then a frequency table of the histogram
intervals is shown as in [Figure 4.1.8]{.figure-ref} in the Log Area.
![](Figure/Fig040107.png){.imgFig30050}
::: figText
[Figure 4.1.7]{.figure-ref} Options of the histogram
:::
![](Figure/Fig040108.png){.imgFig300400}
::: figText
[Figure 4.1.8]{.figure-ref} Frequency table of histogram for otter length
:::
If you want to adjust the histogram intervals from 60kg with an interval
length of 5kg, set 'Interval Start' to 60 and 'Interval Width' to 5 in
the graph options. Press \[Execute New Interval\] button to display the
adjusted histogram as shown in [Figure 4.1.9]{.figure-ref}. Click on \[Frequency
Table\] button to reveal a new frequency table as in [Figure 4.1.10]{.figure-ref}.
![](Figure/Fig040109.svg){.imgFig600540}
::: figText
[Figure 4.1.9]{.figure-ref} Adjusted histogram of otter length
:::
![](Figure/Fig040110.png){.imgFig300300}
::: figText
[Figure 4.1.10]{.figure-ref} Adjusted frequency table of the otter length
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[57])" src="QR/PR040102.svg" type="image"/>
</div>
<div>
**Practice 4.1.2** **(Age of Library Visitors)**\
The following data is a survey on the age of 30 people who visited a
library in the morning. Draw an appropriate histogram and its frequency
table using 『eStat』.
::: textLeft
28 55 26 35 43 47 47 17 35 36 48 47 34 28 43
:::
::: textLeft
20 30 53 27 32 34 43 18 38 29 44 67 48 45 43
:::
::: textLeft
Ex ⇨ eBook ⇨ PR040102_Continuous_LibraryVisitorAge.csv.
:::
</div>
</div>
:::
:::
:::
## Contingency Table for Two Variables
::: presentation-video-link
[presentation](pdf/0402.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/Vvn1IikPr9M){.video-link target="_blank"}
:::
::: mainTable
![](Icon/eStat_icon18_freq.png){.imgIcon}\
A contingency table or cross table is used to summarize two categorical
variables and is also used to study an association of two variables. A
cross table divides a table into rows and columns to create cells by
using possible values of two categorical variables, and then counts the
number of observations (frequency) belonging to the corresponding cell.
Percentage of each cell for the sum of rows, or percentage of each cell
for the sum of columns can be shown in a contingency table for further
analysis. Percentage of each cell for the total number of data can also
be shown in a cross table.
A contingency table is usually made for two qualitative data. In case of
two quantitative data, the quantitative data can be transformed into
qualitative data by using intervals, and then a contingency table for
these qualitative data can be created.
:::
::: mainTableYellow
**Contingency Table**
**Contingency table** or cross table divides a table into rows and
columns to create cells by using possible values of two categorical
variables, and then counts the number of observations (frequency)
belonging to the corresponding cells.
In case of two quantitative data, the data can be transformed into
qualitative data by using intervals, and then a contingency table for
these qualitative data can be created.
:::
### Contingency Table for Two Categorical Variables
::: mainTable
Let us discuss how to create a contingency table from the raw data of
two categorical variables using the following example.
:::
::: mainTableGrey
**Example 4.2.1** **(Survey on Gender and Marital Status)**\
Table 4.2.1 shows survey data on gender (1: Male, 2: Female) and marital
status (1: Single, 2: Married, 3: Other) which are used in Example
2.2.3. Create a contingency table of the marital status by gender using
『eStat』
Table 4.2.1 Survey data on gender and marital status
Gender Marital Status
-------- ----------------
1 1
2 2
1 1
2 1
1 2
1 1
1 1
2 2
1 3
2 1
Ex ⇨ eBook ⇨ EX040201_Categorical_MaritalByGender.csv.
**Answer**
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[20])" src="QR/EX040201.svg" type="image"/>
</div>
<div>
Enter the data of the gender and the marital status in [Table 4.2.1]{.table-ref} to
the sheet of 『eStat』 as in [Figure 4.2.1]{.figure-ref}. Use [Edit Var]{.button-ref} button
to enter a variable name 'Gender' and value labels 'Male' for 1 and
'Female' for 2. In the same way, enter a variable name 'Marital' and
value labels 'Single' for 1, 'Married' for 2 and 'Other' for 3.
The data that were edited for their value labels should be saved in JSON
format file by clicking on the JSON Save icon. If you want to load this
file in JSON format, you must also click on the JSON Open icon which is
for loading a file in JSON format.
![](Figure/Fig040201.png){.imgFig300300}
::: figText
[Figure 4.2.1]{.figure-ref} Data input on gender and marital status
:::
</div>
</div>
Click on the variable name 'Marital' ('Analysis Var'), and then the
variable name 'Gender' ('by Group'). Then you will see a bar graph of
the marital status by gender as in [Figure 4.2.2]{.figure-ref} which is a default
graph. Click the Frequency Table icon to display a contingency table of
the marital status by gender in the Log Area as in [Figure 4.2.3]{.figure-ref}. In
this contingency table, the 'by Group' variable becomes the row variable
and the 'Analysis Var' becomes the column variable. This contingency
table was used to draw the bar graph of the marital status by gender as
in [Figure 4.2.2]{.figure-ref}.
![](Figure/Fig040202.svg){.imgFig600540}
::: figText
[Figure 4.2.2]{.figure-ref} Bar graph on marital status by gender
:::
![](Figure/Fig040203.png){.imgFig400300}
::: figText
[Figure 4.2.3]{.figure-ref} Contingency table on marital status and gender
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[58])" src="QR/PR040201.svg" type="image"/>
</div>
<div>
**Practice 4.2.1** **(Survey on Gender and Vegetable Preference)**\
In a class of an elementary school, a survey on gender (1: male, 2:
female) and favorite vegetable (1: lettuce, 2: spinach, 3: pumpkin, 4:
eggplant) was conducted. The survey data can be found at the following
location of 『eStat』.
::: textLeft
Ex ⇨ eBook ⇨ PR040201_Categorical_VegetablePrefByGender.csv.
:::
Create a contingency table of the favorite vegetable by gender.
</div>
</div>
:::
### Contingency Table for Two Quantitative Variables
::: mainTable
In order to create a contingency table for two quantitative variables,
we need to divide all possible values of each quantitative variable into
some number of intervals as we did when creating a frequency table of
single quantitative variable.
If both variables are quantitative, it is advisable to use a statistical
software such as R, SPSS, and SAS etc. If one variable is categorical
and the other one is quantitative, then a contingency table can be made
by using 『eStat』. Let's take a look at the following example.
:::
::: mainTableGrey
**Example 4.2.2** **(Teacher's Age by Gender)**\
In a middle school, the age and gender of all teachers are surveyed. The
data are saved at the following location of 『eStat』.
Ex ⇨ eBook ⇨ EX040202 Continuous_TeacherAgeByGender.csv.
By using the histogram module of 『eStat』 , create a contingency table
of the age by gender.
**Answer**
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[21])" src="QR/EX040202.svg" type="image"/>
</div>
<div>
Retrieve the data from 『eStat』 as in [Figure 4.2.4]{.figure-ref} and enter value
labels of 'Gender' as 'Male' for 1 and 'Female' for 2.
![](Figure/Fig040204.png){.imgFig300400}
::: figText
[Figure 4.2.4]{.figure-ref} Data input on gender and age
:::
</div>
</div>
After clicking the histogram icon, select the 'Age' variable as
'Analysis Var', and then the 'Gender' variable as 'by Group'. A
histogram will appear as shown in [Figure 4.2.5]{.figure-ref}.
![](Figure/Fig040205.svg){.imgFig600540}
::: figText
[Figure 4.2.5]{.figure-ref} Histogram on age by gender
:::
If you click the button of 'Frequency Table' in the options window
below the graph ([Figure 4.2.6]{.figure-ref}), a contingency table will appear in
the Log Area as shown in [Figure 4.2.7]{.figure-ref}.
![](Figure/Fig040206.png){.imgFig30050}
::: figText
[Figure 4.2.6]{.figure-ref} Options of the histogram
:::
![](Figure/Fig040207.png){.imgFig300400}
::: figText
[Figure 4.2.7]{.figure-ref} Contingency table of age by gender
:::
If the intervals of the histogram in [Figure 4.2.5]{.figure-ref} are to be
readjusted, for example, from 20 to 10 years apart, set 'Interval
Start' to 20 and 'Interval Width' to 10 in the graph options and press
\[Execute New Interval\] button. Then a histogram with the adjusted
intervals is appeared as in [Figure 4.2.8]{.figure-ref}, and a contingency table
with the adjusted intervals can be obtained by clicking on \[Frequency
Table\] button as shown in [Figure 4.2.9]{.figure-ref}.
![](Figure/Fig040208.svg){.imgFig600540}
::: figText
[Figure 4.2.8]{.figure-ref} Histogram with adjusted intervals
:::
![](Figure/Fig040209.png){.imgFig300300}
::: figText
[Figure 4.2.9]{.figure-ref} Contingency table with adjusted intervals
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[59])" src="QR/PR040202.svg" type="image"/>
</div>
<div>
**Practice 4.2.2** **(Oral Cleanliness by Brushing Methods)**\
Oral cleanliness scores according to the brushing method (1:basic
method, 2: rotation method) are examined and stored at the following
location of 『eStat』.
Ex ⇨ eBook ⇨ PR040202_Continuous_ToothCleanByBrushMethod.csv.
Create a contingency table of oral cleanliness by brushing method.
</div>
</div>
:::
:::
:::
## Summary Measures for Quantitative Variable
::: presentation-video-link
[presentation](pdf/040301.pdf){.presentation-link target="_blank"}
[video](https://youtu.be/Nk8VnTiKybo){.video-link target="_blank"}
:::
::: mainTable
The quantitative data can be summarized by using measures of central
tendancy in section 4.3.1 and measures of dispersion in 4.3.2.
:::
### Measures of Central Tendency
::: mainTable
Average, median and mode are the most frequently used measures of
central tendency to summarize the quantitative data.
A **mean or average** is the sum of all data values divided by the
number of data. If data $x_1 ,x_2 ,\cdots, x_N$ are from a population,
the mean of this data is referred to as a population mean and is usually
denoted as $\mu$ in Greek letter. The calculation formula can be defined
as follows. $$
\small \mu = \frac{1}{N} \sum_{i=1}^N x_i
$$ If data $x_1 ,x_2 ,\cdots,x_n$ are sampled from a population,
the mean of this data is referred as a sample mean and denoted as
$\overline x$ (read as 'x bar'). then the mean $\overline x$ is
defined as follows. $$
\small \overline x = \frac{1}{n} \sum_{i=1}^n x_i
$$ Note that both the population mean and sample mean have the
same formula except notation. Also, note that the mean is heavily
influenced by an extreme point where one data value is either very large
or very small.
The sample mean can be understood as the center of gravity representing
sample data. Therefore, the sum of deviations which subtract the sample
mean from each of the sample data is zero as follows. $$
\small \sum_{i=1}^n (x_i - \overline x ) = 0
$$
The sample mean has many good characteristics (Chapter 6) and is
frequently used to estimate the population mean.
A **median** is the value placed in the middle when data are listed in
ascending order of their values and is denoted as $M$ if data are from a
population or $m$ if data are sampled from a population. If the number
of sample data, $n$, is an odd number, the median is the data value
located at the ${\left( n+1 \above 1pt 2 \right)}^\text{th}$ when data
are arranged in ascending order. If $n$ is an even number, then the
median is the average of the data values located at the
${\left( n \above 1pt 2 \right)}^\text{th}$ and
${\left( n+2 \above 1pt 2 \right)}^\text{th}$.
$$
\begin{align}
m &= \left( \frac{n+1}{2}\right)^\text{th} \text{ data } & \text{if $n$ is odd}\\
&= \frac{ (\frac{n}{2})^\text{th} + \left(\frac{n+2}{2} \right)^\text{th} \text{ data }}{2} & \text{if $n$ is even}
\end{align}
$$
The median is not sensitive even if there is an extreme point in data,
so it is often used as a measure of the central tendency when there is
an extreme point.
A **mode** is the most frequently occurred value among data values. $$
\small \textit{Mode} = \text{the most frequently occurred value among data values}
$$ In case of the quantitative data, since there might be so many
possible values, it is not reasonable to set a mode value as the most
frequently occurred data value. In this case, we usually transform the
quantitative data into the qualitative data by dividing the data values
into several not-overlapped intervals and count frequencies of each
interval. The middle value of an interval which has the highest
frequency is set to the mode.
:::
::: mainTableYellow
**Mean, Median and Mode**
**Mean** or **average** is the sum of all observed data divided by the
number of data. The mean can be understood as the center of gravity
representing data. The population mean is denoted as $\mu$ and the
sample mean is denoted as $\overline x$.
**Median** is the value placed in the middle when data are listed in
ascending order of their values. The population median is usually
denoted as $M$ and the sample median is denoted as $m$.
**Mode** is the most frequently occurred value among data values.
:::
::: mainTableGrey
**Example 4.3.1** **(Quiz scores)**\
Quiz scores of seven students in a class of Statistics are sampled
randomly as follows.
::: textLeft
5, 6, 3, 7, 9, 4, 8
:::
::: textLeft
Ex ⇨ eBook ⇨ EX040301_Continuous_QuizScore.csv.
:::
Calculate the mean and median of this data and compare the result with
『eStat』 output.
**Answer**
The sample mean is calculated as follows.
$\qquad \small \overline x ~=~ { {5 + 6 + 3 + 7 + 9 + 4 + 8} \over 7} ~=~ 6$
In order to find the sample median, first arrange the data in ascending
order of data values as follows:
::: textLeft
3, 4, 5, 6, 7, 8, 9
:::
Since the sample size, 7, is an odd number, median is
$\small {\left( 7+1 \over 2 \right)}^{th} ~=~4^{th}$ data in the
arranged data as above which is 6.
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[22])" src="QR/EX040301.svg" type="image"/>
</div>
<div>
In order to use 『eStat』 , enter the data in column V1 of the sheet as
in [Figure 4.3.1]{.figure-ref}. Click the Dot Graph icon and click the variable
name 'Quiz' to see the dot graph of data as in [Figure 4.3.2]{.figure-ref}. If you
check the option 'Mean/StdDev', you can see the location of mean and the
length of standard deviation.
![](Figure/Fig040301.png){.imgFig300200}
::: figText
[Figure 4.3.1]{.figure-ref} Data input
:::
</div>
</div>
![](Figure/Fig040302.svg){.imgFig600540}
::: figText
[Figure 4.3.2]{.figure-ref} Dot graph with mean and standard deviation.
:::
If you click the Descriptive Statistics icon , then a table of all
descriptive statistics will result in the Log Area as shown in \<Figure
4.3.3\>. It shows not only mean and median, but also other statistics
such as the standard deviation, minimum, and maximum etc.
![](Figure/Fig040303.png){.imgFig500100}
::: figText
[Figure 4.3.3]{.figure-ref} Basic statistics of data
:::
You can also use 『eStatU』 to calculate the descriptive statistics and
simulate an influence of extreme point. Select 'Dot Graph -- Box Plot --
Descriptive Statistics' from the menu of 『eStatU』 and enter data as in
[Figure 4.3.4]{.figure-ref}. 『eStatU』 calculates all statistics while you are
entering data.
![](Figure/Fig040304.png){.imgFig500200}
::: figText
[Figure 4.3.4]{.figure-ref} 『eStatU』 basic statistics of data
:::
If you click the [Execute]{.button-ref} button, two sets of dot graph and box plot
appear as in [Figure 4.3.5]{.figure-ref}. The first graph is for the data you
entered and the second one is for simulation. On the second bar graph of
[Figure 4.3.5]{.figure-ref}. you can click a point (circle) using your mouse and
move to other far side location of axis (make an extreme point) to check
its influence on mean and median. You can see that the mean is changed a
lot by the extreme point, but the median is not changed by the extreme
point.
![](Figure/Fig040305.svg){.imgFig600400}
::: figText
[Figure 4.3.5]{.figure-ref} 『eStatU』 with simulation of an extreme point
:::
:::
::: mainTablePink
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[60])" src="QR/PR040301.svg" type="image"/>
</div>
<iframe class="example" src="example/ex-4-3.html">
</iframe>
<div>
**Practice 4.3.1** **(Otter Length)**\
The lengths of 30 otters are measured (in cm) and the data are saved at
the following location of 『eStat』.
::: textLeft
Ex ⇨ eBook ⇨ PR040301_Continuous_OtterLength.csv
:::
::: textL20M20
1\) Use 『eStat』 to obtain the mean, median, minimum and maximum of
this data.
:::
::: textL20M20
2\) Copy this data to 『eStatU』 and draw a dot graph and a box plot.
Simulate the influence of an outlier.
:::
</div>
</div>
:::
::: mainTableGrey
**Example 4.3.2** **(Library Visitor)**\
If a frequency table of visitors' age in a library is as shown in Table
4.3.1, find the mode of the age based on this frequency table.
::: textLeft
Table 4.3.1 Frequency table of visitor's age in a libray
:::
Age Interval Frequency
----------------- -----------
\[20.00, 30.00) 2 ( 6.7%)
\[30.00, 40.00) 7 (23.3%)
\[40.00, 50.00) 7 (23.3%)
\[50.00, 60.00) 9 (30.0%)
\[60.00, 70.00) 3 (10.0%)
\[70.00, 80.00) 2 ( 6.7%)
Total 30 (100%)
**Answer**
The interval \[50.00, 60.00) has the highest frequency which is 9 and
median is the mid value of the interval \[50.00, 60.00) is 55.
:::
::: mainTable
There are several variants to compensate the disadvantage of the mean,
one of which is a trimmed mean. This is to list the data in order and
then average the data except for a constant number of large and small
values respectively in order to eliminate the extremes. The trimmed mean
is often used to prevent biased judging by referees in sports such as
gymnastics and figure skating at the Olympics. You may remove the top
few percent data instead of the maximum and the bottom few percent data
instead of the minimum.
Another variant is a weighted mean in which each measurement is
multiplied by a constant weight to obtain the mean. The grade point
average for college students which uses the weights of credit hours is
an example of the weighted mean. The price index which uses the weights
of the total amount of sales of the goods is another example of the
weighted mean. If $x_{1} ,x_{2}, \dots , x_{n}$ are the data values and
their weights are $w_{1} , w_{2} ,\dots , w_{n}$, then the weighted mean
is defined as the following. $$
\text{Weighted Mean} ~=~ { {w_{1} x_{1} +w_{2} x_{2} + \cdots + w_{n} x_{n}} \over {w_{1} + w_{2} + \cdots + w_{n}} } ~=~ { {\sum _{i=1} ^{n} w_{i} x_{i}} \over {\sum _{i=1} ^{n} w_{i}} }
$$
:::
::: mainTableYellow
**Trimmed Mean and Weighted Mean**
**Trimmed mean** is the average of data except for a constant number of
large and small values respectively in order to eliminate extremes.
**Weighted mean** is the average of weighted sum in which each
measurement is multiplied by some weight and divided by the sum of all
weights.
:::
::: mainTableGrey
**Example 4.3.3** **(Olympic Gymnastics Game)**\
An Olympic Gymnastics Game was judged by eight referees and their result
are as follows:
::: textLeft
9.0 9.5 9.3 7.2 10.0 9.1 9.4 9.0
:::
Find the mean and median of this data. Also, find the trimmed mean which
excludes the minimum and the maximum. Compare both results.
**Answer**
This data is not a sample but a population of eight. The mean is as
follows.
$\qquad \small \mu ~=~ (9.0 + 9.5 + 9.3 + 7.2 + 10.0 + 9.1 + 9.4 + 9.0) / 8 ~=~ 72.5 / 8 ~=~ 9.063$
Since the number of data is $\small N$ = 8 which is an even number, the
median is the average of the 4th and the 5th data in the ordered list as
follows:
::: textLeft
7.2 9.0 9.0 9.1 9.3 9.4 9.5 10.0
:::
Therefore, the median is the average of 9.1 and 9.3 which is 9.2.
The trimmed mean is the average of the remaining numbers, except the
minimum of 7.2 and the maximum of 10.0.
$\qquad \small \text{Trimmed Mean} ~=~ (9.0 + 9.0 + 9.1 + 9.3 + 9.4 + 9.5) / 6 ~=~ 55.3 / 6 ~=~ 9.217$
In this data, the median or the trimmed mean is more representative of
the data than the arithmetic mean.
:::
::: mainTableGrey
**Example 4.3.4** **(Weighted Mean)**\
A student took three courses in History (two credits), Math (four
credits), and English (three credits) in last semester, and got A in
History, B in math and C in English. Calculate the mean and the weighted
mean with the number of credits if A is rated 4 points, B is 3 points,
and C is 2 points.
**Answer**
$\small \qquad \text{Mean = } \frac{4 + 3 + 2 }{3} = 3$
$\small \qquad \text{Weighted Mean = } \frac { 2×4 + 4×3 + 3×2 } { 2 + 4 + 3 } = \frac{ 8 + 12 + 6} {9} = 2.89$
Weighted mean is less than mean, because although the grade of History
which has two credits was A, the grade of English which has three
credits was relatively poor C.
:::
::: mainTablePink
### Multiple Choice Exercise
Choose one answer and click Submit button
::: textL30M30
4.1 Which of the following data is an average of 28, a median of 30, and
a maximum of 40?
:::
<form name="Q1">
<label><input name="item" type="radio" value="1"/> 12, 20, 30, 40</label><br/>
<label><input name="item" type="radio" value="2"/> 12, 30, 30, 40</label><br/>
<label><input name="item" type="radio" value="3"/> 12, 40, 30, 40</label><br/>
<label><input name="item" type="radio" value="4"/> 12, 40, 20, 40</label><br/>
<p>
<input onclick="radio(4,1,Q1)" type="button" value="Submit"/>
<input id="ansQ1" size="15" type="text"/>