-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.tex
1974 lines (1556 loc) · 70.2 KB
/
index.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
letterpaper,
DIV=11,
numbers=noendperiod]{scrreprt}
\usepackage{amsmath,amssymb}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else
% xetex/luatex font selection
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\usepackage{svg}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{5}
% Make \paragraph and \subparagraph free-standing
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{241,243,245}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.40,0.45,0.13}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\BuiltInTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\ExtensionTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.28,0.35,0.67}{#1}}
\newcommand{\ImportTok}[1]{\textcolor[rgb]{0.00,0.46,0.62}{#1}}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\NormalTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.68,0.00,0.00}{#1}}
\newcommand{\RegionMarkerTok}[1]{\textcolor[rgb]{0.00,0.23,0.31}{#1}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.07,0.07,0.07}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.13,0.47,0.30}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.37,0.37,0.37}{\textit{#1}}}
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\newlength{\cslhangindent}
\setlength{\cslhangindent}{1.5em}
\newlength{\csllabelwidth}
\setlength{\csllabelwidth}{3em}
\newlength{\cslentryspacingunit} % times entry-spacing
\setlength{\cslentryspacingunit}{\parskip}
\newenvironment{CSLReferences}[2] % #1 hanging-ident, #2 entry spacing
{% don't indent paragraphs
\setlength{\parindent}{0pt}
% turn on hanging indent if param 1 is 1
\ifodd #1
\let\oldpar\par
\def\par{\hangindent=\cslhangindent\oldpar}
\fi
% set entry spacing
\setlength{\parskip}{#2\cslentryspacingunit}
}%
{}
\usepackage{calc}
\newcommand{\CSLBlock}[1]{#1\hfill\break}
\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}}
\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break}
\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1}
\KOMAoption{captions}{tableheading}
\makeatletter
\@ifpackageloaded{tcolorbox}{}{\usepackage[skins,breakable]{tcolorbox}}
\@ifpackageloaded{fontawesome5}{}{\usepackage{fontawesome5}}
\definecolor{quarto-callout-color}{HTML}{909090}
\definecolor{quarto-callout-note-color}{HTML}{0758E5}
\definecolor{quarto-callout-important-color}{HTML}{CC1914}
\definecolor{quarto-callout-warning-color}{HTML}{EB9113}
\definecolor{quarto-callout-tip-color}{HTML}{00A047}
\definecolor{quarto-callout-caution-color}{HTML}{FC5300}
\definecolor{quarto-callout-color-frame}{HTML}{acacac}
\definecolor{quarto-callout-note-color-frame}{HTML}{4582ec}
\definecolor{quarto-callout-important-color-frame}{HTML}{d9534f}
\definecolor{quarto-callout-warning-color-frame}{HTML}{f0ad4e}
\definecolor{quarto-callout-tip-color-frame}{HTML}{02b875}
\definecolor{quarto-callout-caution-color-frame}{HTML}{fd7e14}
\makeatother
\makeatletter
\makeatother
\makeatletter
\@ifpackageloaded{bookmark}{}{\usepackage{bookmark}}
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\AtBeginDocument{%
\ifdefined\contentsname
\renewcommand*\contentsname{Table of contents}
\else
\newcommand\contentsname{Table of contents}
\fi
\ifdefined\listfigurename
\renewcommand*\listfigurename{List of Figures}
\else
\newcommand\listfigurename{List of Figures}
\fi
\ifdefined\listtablename
\renewcommand*\listtablename{List of Tables}
\else
\newcommand\listtablename{List of Tables}
\fi
\ifdefined\figurename
\renewcommand*\figurename{Figure}
\else
\newcommand\figurename{Figure}
\fi
\ifdefined\tablename
\renewcommand*\tablename{Table}
\else
\newcommand\tablename{Table}
\fi
}
\@ifpackageloaded{float}{}{\usepackage{float}}
\floatstyle{ruled}
\@ifundefined{c@chapter}{\newfloat{codelisting}{h}{lop}}{\newfloat{codelisting}{h}{lop}[chapter]}
\floatname{codelisting}{Listing}
\newcommand*\listoflistings{\listof{codelisting}{List of Listings}}
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\@ifpackageloaded{subcaption}{}{\usepackage{subcaption}}
\makeatother
\makeatletter
\@ifpackageloaded{tcolorbox}{}{\usepackage[skins,breakable]{tcolorbox}}
\makeatother
\makeatletter
\@ifundefined{shadecolor}{\definecolor{shadecolor}{rgb}{.97, .97, .97}}
\makeatother
\makeatletter
\makeatother
\makeatletter
\makeatother
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same} % disable monospaced font for URLs
\hypersetup{
pdftitle={Computational Analysis for Bioscientists},
pdfauthor={Emma Rand},
colorlinks=true,
linkcolor={blue},
filecolor={Maroon},
citecolor={Blue},
urlcolor={Blue},
pdfcreator={LaTeX via pandoc}}
\title{Computational Analysis for Bioscientists}
\usepackage{etoolbox}
\makeatletter
\providecommand{\subtitle}[1]{% add subtitle to \maketitle
\apptocmd{\@title}{\par {\large #1 \par}}{}{}
}
\makeatother
\subtitle{Data Analysis in R and what they forgot to teach you about
computers!}
\author{Emma Rand}
\date{2023-03-01}
\begin{document}
\maketitle
\ifdefined\Shaded\renewenvironment{Shaded}{\begin{tcolorbox}[boxrule=0pt, sharp corners, borderline west={3pt}{0pt}{shadecolor}, breakable, interior hidden, enhanced, frame hidden]}{\end{tcolorbox}}\fi
\renewcommand*\contentsname{Table of contents}
{
\hypersetup{linkcolor=}
\setcounter{tocdepth}{2}
\tableofcontents
}
\bookmarksetup{startatroot}
\hypertarget{welcome}{%
\chapter*{Welcome!}\label{welcome}}
\addcontentsline{toc}{chapter}{Welcome!}
\markboth{Welcome!}{Welcome!}
front page stuff
\bookmarksetup{startatroot}
\hypertarget{about-this-book}{%
\chapter{About this book}\label{about-this-book}}
:::: status ::: callout-important You are reading a work in progress.
This page is a dumping ground for ideas and not really readable. :::
::::
Who is this book for
bioscience
undergrads
It is in sections
part 1 what they forgot to teach you
focus on what causes problems for people learning to code.
part 2 Getting started with data. give summary
part 3 Data Analysis, improve name, give summary (babs 2)
When you see.. Try to answer before looking at the code
Your turn! Assign the value of \texttt{4} to a variable called
\texttt{y}:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{y }\OtherTok{\textless{}{-}} \DecValTok{4}
\end{Highlighting}
\end{Shaded}
conventions used in this book
\part{What they forgot to teach you about computers}
\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, toprule=.15mm, arc=.35mm, colback=white, colframe=quarto-callout-important-color-frame, opacityback=0, titlerule=0mm, colbacktitle=quarto-callout-important-color!10!white, leftrule=.75mm, breakable, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-important-color}{\faExclamation}\hspace{0.5em}{Important}, rightrule=.15mm, bottomrule=.15mm, coltitle=black, left=2mm]
You are reading a work in progress. This page is a dumping ground for
ideas and not really readable. :::
\end{tcolorbox}
Why this part
give a summary of contents
\hypertarget{operating-systems}{%
\chapter{Operating Systems}\label{operating-systems}}
:::: status ::: callout-important You are reading a work in progress.
This page is a dumping ground for ideas and not really readable. :::
\hypertarget{what-is-an-operating-system}{%
\section{what is an operating
system}\label{what-is-an-operating-system}}
\hypertarget{types-of-operating-system}{%
\section{types of operating system}\label{types-of-operating-system}}
include windows, mac, unix, tablets, android, apple
\hypertarget{differences-in-how-you-use-them}{%
\section{differences in how you use
them}\label{differences-in-how-you-use-them}}
keyboard keys and characters
For RStudio, the section on
\protect\hyperlink{keyboard-short-cuts-and-other-tips}{Keyboard
Shortcuts and tips} willhelp.
\begin{itemize}
\tightlist
\item
enter / return
\item
control / command
\item
alt / option
\end{itemize}
Finder and Explorer
installing software
\hypertarget{understanding-file-systems}{%
\chapter{Understanding file systems}\label{understanding-file-systems}}
\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, toprule=.15mm, arc=.35mm, colback=white, colframe=quarto-callout-important-color-frame, opacityback=0, titlerule=0mm, colbacktitle=quarto-callout-important-color!10!white, leftrule=.75mm, breakable, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-important-color}{\faExclamation}\hspace{0.5em}{Important}, rightrule=.15mm, bottomrule=.15mm, coltitle=black, left=2mm]
You are reading a work in progress. This page is a dumping ground for
ideas and not really readable. :::
\end{tcolorbox}
A file is a unit of storage on a computer with a name that uniquely
identifies it. Files can be of different types depending on the sort of
information held in them. The file name very often consists of two
parts, separated by a dot:
\begin{itemize}
\item
name - the base name of the file
\item
extension that should indicate the format or content of the file.
\end{itemize}
Some examples are report.doc, analysis.R, culture.csv and readme.txt.
The relationship between the file extension and the file type
One of the simplest types of file is a ``text file'' which contains text
characters without formatting such as bold or italics and no images or
colours. Plain text files can be opened in any text editor like Windows
Notepad or Mac's TextEdit.
Data is commonly held in text files because they can be read by many
programs
files of file plain text, markup and markdown
file extensions
the relationship between file extensions and programs
A file system contains files and folders
files systems are hierarchical
\begin{figure}
{\centering \includegraphics{images/file-system.png}
}
\caption{A file hierarchy containing 4 levels of folders and files}
\end{figure}
folder is a directory getwd(), dir() in R, cd, pwd in unix, os.getcwd()
in Python
using a file explorer, showing file extensions
Paths
root directory
typical structure on windows and mac
Working directory
Relative and absolute paths
save files fromthe internet chrome://settings/downloads
\hypertarget{organising-your-work}{%
\chapter{Organising your work}\label{organising-your-work}}
:::: status ::: callout-important You are reading a work in progress.
This page is a dumping ground for ideas and not really readable. :::
use folder
consistency
naming things
::: \{.quarto-book-part\}
\part{Getting started with data}
\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, toprule=.15mm, arc=.35mm, colback=white, colframe=quarto-callout-important-color-frame, opacityback=0, titlerule=0mm, colbacktitle=quarto-callout-important-color!10!white, leftrule=.75mm, breakable, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-important-color}{\faExclamation}\hspace{0.5em}{Important}, rightrule=.15mm, bottomrule=.15mm, coltitle=black, left=2mm]
You are reading a work in progress. This page is a dumping ground for
ideas and not really readable. :::
\end{tcolorbox}
why this part
summary of the chapters
general ideas about data and data types
first steps with rstudio
working with data in RStudio
\hypertarget{ideas-about-data}{%
\chapter{Ideas about data}\label{ideas-about-data}}
\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, toprule=.15mm, arc=.35mm, colback=white, colframe=quarto-callout-important-color-frame, opacityback=0, titlerule=0mm, colbacktitle=quarto-callout-important-color!10!white, leftrule=.75mm, breakable, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-important-color}{\faExclamation}\hspace{0.5em}{Important}, rightrule=.15mm, bottomrule=.15mm, coltitle=black, left=2mm]
You are reading a work in progress. This page is a dumping ground for
ideas and not really readable. :::
\end{tcolorbox}
This chapter covers some important concepts data. Data is made up of
properties we have measured or recorded, known as variables, and
observations, the individual things with those properties. Data is most
commonly (and helpfully) organised with variables in columns and each
observation on a row.
We can define a variable in two main ways:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
by what kinds of value it can take and how frequently each of its
possible values occur
\item
by what role the variable takes in analysis
\end{enumerate}
Both of these determine how we summarise, plot and analyse data.
\hypertarget{role-in-analysis}{%
\section{Role in analysis}\label{role-in-analysis}}
When we do research, we typically have variables that we choose or set
and variables that we measure. The variables we choose or set are called
independent or explanatory variables. The variables we measure are
called dependent or response variables.
TODO: examples
\hypertarget{kinds-of-value-data-types}{%
\section{Kinds of value: data types}\label{kinds-of-value-data-types}}
The types of values a variable can take determines how we summarise,
plot and analyse them. Sometimes this is obvious - when you can recorded
the colour of an observation you can't find the mean colour of the
sample but you can report the most common colour.
An important distinction is between discrete and continuous types of
data. Continuous variables are measurements that can take any value in
their range. Discrete variables can take only specific values.
\hypertarget{discrete-data}{%
\subsection{Discrete data}\label{discrete-data}}
Discrete variables can take only specific values, like genotype or the
number of leaves
\hypertarget{nominal-and-ordinal}{%
\subsubsection{Nominal and Ordinal}\label{nominal-and-ordinal}}
Nominal and ordinal data are categorical and often act as explanatory
variables.\\
Nominal variable have no particular order, for example, the eye colour
of Drosophila or the genotype of a mouse. When summarising data on eye
colour, it wouldn't matter what order the information was given or
plotted. Ordinal variables have an order. The Likert scale used in
questionnaires is one example. The possible responses are Strongly
agree, Agree, Disagree and Strongly disagree; these have an order that
you would use when plotting them.
Summarising nominal or ordinal data The most appropriate way to
summarise nominal or ordinal data is to report the most frequent values
or tabulate the number of each value.
\hypertarget{counts}{%
\subsubsection{Counts}\label{counts}}
Counts are one of the most common data types. They are quantitative but
discrete because they can take only specific values
\hypertarget{continuous-data}{%
\subsection{Continuous data}\label{continuous-data}}
Continuous variables are measurements that can take \emph{any} value in
their range so there are an infinite number of possible values. The
values have decimal places. Variables like the length and mass of an
organism, the volume and optical density of a solution, or the colour
intensity of an image are continuous. Many response variables are
continuous but continuous variables can also be explanatory. For
example,
\hypertarget{distributions}{%
\section{Distributions}\label{distributions}}
The distribution of a variable describes the types of values it can take
and the likelihood of each value occurring. For example, for a variable
like human height values of 1.65 metres occur more often than values of
2 metres and values of 3 metres never occur.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{m }\OtherTok{\textless{}{-}} \FloatTok{1.65}
\NormalTok{sd }\OtherTok{\textless{}{-}} \FloatTok{0.06}
\FunctionTok{ggplot}\NormalTok{(}\AttributeTok{data =} \FunctionTok{data.frame}\NormalTok{(}\AttributeTok{Height =} \FunctionTok{c}\NormalTok{(m }\SpecialCharTok{{-}} \DecValTok{3} \SpecialCharTok{*}\NormalTok{ sd, m }\SpecialCharTok{+} \DecValTok{3} \SpecialCharTok{*}\NormalTok{ sd)), }\FunctionTok{aes}\NormalTok{(Height)) }\SpecialCharTok{+}
\FunctionTok{stat\_function}\NormalTok{(}\AttributeTok{fun =}\NormalTok{ dnorm, }\AttributeTok{n =} \DecValTok{101}\NormalTok{, }
\AttributeTok{args =} \FunctionTok{list}\NormalTok{(}\AttributeTok{mean =}\NormalTok{ m, }\AttributeTok{sd =}\NormalTok{ sd)) }\SpecialCharTok{+}
\FunctionTok{scale\_y\_continuous}\NormalTok{(}\AttributeTok{breaks =} \ConstantTok{NULL}\NormalTok{, }\AttributeTok{name =} \StringTok{""}\NormalTok{,}
\AttributeTok{expand =} \FunctionTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) }\SpecialCharTok{+}
\FunctionTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\AttributeTok{x =} \FloatTok{1.5}\NormalTok{, }\AttributeTok{y =} \FloatTok{4.5}\NormalTok{,}
\AttributeTok{label =} \StringTok{"Values are rare"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\AttributeTok{x =} \FloatTok{1.65}\NormalTok{, }\AttributeTok{y =} \FloatTok{4.5}\NormalTok{,}
\AttributeTok{label =} \StringTok{"Values are common"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\AttributeTok{x =} \FloatTok{1.8}\NormalTok{, }\AttributeTok{y =} \FloatTok{4.5}\NormalTok{,}
\AttributeTok{label =} \StringTok{"Values are rare"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{theme\_classic}\NormalTok{()}
\end{Highlighting}
\end{Shaded}
\begin{figure}[H]
{\centering \includegraphics{ideas_about_data_files/figure-pdf/unnamed-chunk-2-1.pdf}
}
\end{figure}
\hypertarget{the-normal-distribution}{%
\subsection{The normal distribution}\label{the-normal-distribution}}
\hypertarget{distribution-of-counts}{%
\subsection{Distribution of counts}\label{distribution-of-counts}}
\hypertarget{theory-and-practice}{%
\section{Theory and practice}\label{theory-and-practice}}
The distinction between continuous and discrete values is clear in
theory but in practice, the actual values you have might differ from
what we would expect for a particular variable. For example, we would
expect the mass of cats to be continuous but if our scales only measure
to the nearest kilogram then
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{m }\OtherTok{\textless{}{-}} \DecValTok{4}
\NormalTok{sd }\OtherTok{\textless{}{-}} \FloatTok{0.8}
\FunctionTok{set.seed}\NormalTok{(}\DecValTok{1234}\NormalTok{)}
\NormalTok{a }\OtherTok{\textless{}{-}} \FunctionTok{ggplot}\NormalTok{(}\AttributeTok{data =} \FunctionTok{data.frame}\NormalTok{(}\AttributeTok{Mass =} \FunctionTok{c}\NormalTok{(m }\SpecialCharTok{{-}} \DecValTok{3} \SpecialCharTok{*}\NormalTok{ sd, m }\SpecialCharTok{+} \DecValTok{3} \SpecialCharTok{*}\NormalTok{ sd)), }\FunctionTok{aes}\NormalTok{(Mass)) }\SpecialCharTok{+}
\FunctionTok{stat\_function}\NormalTok{(}\AttributeTok{fun =}\NormalTok{ dnorm, }\AttributeTok{n =} \DecValTok{101}\NormalTok{, }
\AttributeTok{args =} \FunctionTok{list}\NormalTok{(}\AttributeTok{mean =}\NormalTok{ m, }\AttributeTok{sd =}\NormalTok{ sd)) }\SpecialCharTok{+}
\FunctionTok{scale\_y\_continuous}\NormalTok{(}\AttributeTok{breaks =} \ConstantTok{NULL}\NormalTok{, }\AttributeTok{name =} \StringTok{""}\NormalTok{, }
\AttributeTok{expand =} \FunctionTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) }\SpecialCharTok{+}
\FunctionTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\AttributeTok{x =}\NormalTok{ m }\SpecialCharTok{{-}} \DecValTok{2} \SpecialCharTok{*}\NormalTok{ sd, }\AttributeTok{y =} \FloatTok{0.4}\NormalTok{,}
\AttributeTok{label =} \StringTok{"Theory"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{theme\_classic}\NormalTok{()}
\NormalTok{b }\OtherTok{\textless{}{-}} \FunctionTok{ggplot}\NormalTok{(}\AttributeTok{data =} \FunctionTok{data.frame}\NormalTok{(}\AttributeTok{Mass =} \FunctionTok{round}\NormalTok{(}\FunctionTok{rnorm}\NormalTok{(}\DecValTok{1000}\NormalTok{, m, sd), }\DecValTok{0}\NormalTok{)), }\FunctionTok{aes}\NormalTok{(Mass)) }\SpecialCharTok{+}
\FunctionTok{geom\_histogram}\NormalTok{(}\AttributeTok{binwidth =} \DecValTok{1}\NormalTok{, }\AttributeTok{colour =} \StringTok{"black"}\NormalTok{, }\AttributeTok{fill =} \StringTok{"white"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{scale\_y\_continuous}\NormalTok{(}\AttributeTok{breaks =} \ConstantTok{NULL}\NormalTok{, }\AttributeTok{name =} \StringTok{""}\NormalTok{,}
\AttributeTok{expand =} \FunctionTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) }\SpecialCharTok{+}
\FunctionTok{annotate}\NormalTok{(}\StringTok{"text"}\NormalTok{, }\AttributeTok{x =}\NormalTok{ m }\SpecialCharTok{{-}} \DecValTok{2} \SpecialCharTok{*}\NormalTok{ sd, }\AttributeTok{y =} \DecValTok{400}\NormalTok{,}
\AttributeTok{label =} \StringTok{"Practice"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{theme\_classic}\NormalTok{()}
\NormalTok{a }\SpecialCharTok{+}\NormalTok{ b}
\end{Highlighting}
\end{Shaded}
\begin{figure}[H]
{\centering \includegraphics{ideas_about_data_files/figure-pdf/unnamed-chunk-3-1.pdf}
}
\end{figure}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{m }\OtherTok{\textless{}{-}} \DecValTok{120000}
\NormalTok{sd }\OtherTok{\textless{}{-}} \DecValTok{20000}
\FunctionTok{set.seed}\NormalTok{(}\DecValTok{12}\NormalTok{)}
\FunctionTok{ggplot}\NormalTok{() }\SpecialCharTok{+}
\FunctionTok{geom\_histogram}\NormalTok{(}\AttributeTok{data =} \FunctionTok{data.frame}\NormalTok{(}\AttributeTok{hairs =} \FunctionTok{round}\NormalTok{(}\FunctionTok{rnorm}\NormalTok{(}\DecValTok{60000}\NormalTok{, m, sd), }\DecValTok{0}\NormalTok{)),}
\FunctionTok{aes}\NormalTok{(hairs),}
\AttributeTok{bins =} \DecValTok{120}\NormalTok{, }\AttributeTok{colour =} \StringTok{"black"}\NormalTok{, }\AttributeTok{fill =} \StringTok{"white"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{scale\_y\_continuous}\NormalTok{(}\AttributeTok{breaks =} \ConstantTok{NULL}\NormalTok{, }\AttributeTok{name =} \StringTok{""}\NormalTok{,}
\AttributeTok{expand =} \FunctionTok{c}\NormalTok{(}\DecValTok{0}\NormalTok{, }\DecValTok{0}\NormalTok{)) }\SpecialCharTok{+}
\FunctionTok{scale\_x\_continuous}\NormalTok{(}\StringTok{"Number of hairs on head"}\NormalTok{) }\SpecialCharTok{+}
\FunctionTok{theme\_classic}\NormalTok{()}
\end{Highlighting}
\end{Shaded}
\begin{figure}[H]
{\centering \includegraphics{ideas_about_data_files/figure-pdf/unnamed-chunk-4-1.pdf}
}
\end{figure}
\hypertarget{first-steps-in-rstudio}{%
\chapter{First Steps in RStudio}\label{first-steps-in-rstudio}}
:::: status ::: callout-important You are reading a work in progress.
This page almost readable but is a first draft and may substantial
edits. :::
This chapter starts by explaining what R and RStudio are and how you can
install them on your own machine. We introduce you to working in
RStudio, changing its appearance to suit you and to the key things you
need to know about R.
\hypertarget{what-are-r-and-rstudio}{%
\section{What are R and Rstudio?}\label{what-are-r-and-rstudio}}
\hypertarget{what-is-r}{%
\subsection{What is R?}\label{what-is-r}}
R is a programming language and environment for statistical computing
and graphics which is free and open source. It is widely used in
industry and academia. It is what is known as a ``domain-specific''
language meaning that it is designed especially for doing data analysis
and visualisation rather than a ``general-purpose'' programming language
like Python and C++. It makes doing the sorts of things that
bioscientists do a bit easier than in a general purpose-language.
\hypertarget{what-is-rstudio}{%
\subsection{What is RStudio?}\label{what-is-rstudio}}
RStudio is what is known as an ``integrated development environment''
(IDE) for R made by \href{https://posit.co/}{Posit}. IDEs have features
that make it easier to do coding like syntax highlighting, code
completion and viewers for files, code objects, packages and plots. You
don't have to use RStudio to use R but it is very helpful.
\hypertarget{why-is-it-better-to-use-r-than-excel-googlesheets-or-some-other-spreadsheet-program}{%
\subsection{Why is it better to use R than Excel, googlesheets or some
other spreadsheet
program?}\label{why-is-it-better-to-use-r-than-excel-googlesheets-or-some-other-spreadsheet-program}}
Spreadsheet programs are not statistical packages so although you can
carry out some analysis tasks in them they are
\href{https://www.gapintelligence.com/blog/understanding-r-programming-over-excel-for-data-analysis/}{limited},
get things wrong
(\href{https://www.sciencedirect.com/science/article/abs/pii/0167947394901775}{known
about since 1994}) and
\href{https://www.teampay.co/blog/biggest-excel-mistakes-of-all-time}{teach
you bad data habits}. Spreadsheets encourage you to do things that are
\href{https://datacarpentry.org/2015-05-03-NDIC/excel-ecology/02-common-mistakes.html}{going
to make analysis difficult}.
\hypertarget{why-is-it-better-to-use-r-than-spss-minitab-or-some-other-menu-driven-statistics-program}{%
\subsection{Why is it better to use R than SPSS, Minitab or some other
menu-driven statistics
program?}\label{why-is-it-better-to-use-r-than-spss-minitab-or-some-other-menu-driven-statistics-program}}
\begin{itemize}
\tightlist
\item
R is free and open source which it will always be available to you .
\item
Carrying out data analysis using coding makes everything you do
reproducible
\item
The skills and expertise you gain through learning R are highly
transferable -- much more so than those acquired using SPSS.
\item
See Thomas Mock's demonstration of doing some data analysis in R
including ``The Kick Ass Curve'':
https://rstudio.com/resources/webinars/a-gentle-introduction-to-tidy-statistics-in-r/
\end{itemize}
There are other good options such as Julia and Python and you are
encouraged to explore these. We chose R in part because of the R
community which is one of R's greatest assets, being vibrant, inclusive
and supportive of users at all levels.
https://ropensci.org/blog/2017/06/23/community/
\hypertarget{installing-r-and-rstudio}{%
\section{Installing R and Rstudio}\label{installing-r-and-rstudio}}
You will need to install both R and RStudio to use them on your own
machine. Installation is normally straightforward but you can follow a
tutorial here:
https://learnr-examples.shinyapps.io/ex-setup-r/\#section-welcome
\hypertarget{installing-r}{%
\subsection{Installing R}\label{installing-r}}
Go to \url{https://cloud.r-project.org/} and download the ``Precompiled
binary distributions of the base system and contributed packages''
appropriate for your machine.
\hypertarget{for-windows}{%
\subsubsection{For Windows}\label{for-windows}}
Click ``Download R for Windows'', then ``base'', then ``Download R
4.\#.\# for Windows''. This will download an \texttt{.exe} file. Once
downloaded, open (double click) that file to start the installation.
\hypertarget{for-mac}{%
\subsubsection{For Mac}\label{for-mac}}
Click ``Download R for (Mac) OS X'', then ``R-4.\#.\#.pkg'' to download
the installer. Run the installer to complete installation.
\hypertarget{for-linux}{%
\subsubsection{For Linux}\label{for-linux}}
Click ``Download R for Linux''. Instructions on installing are given for
Debian, Redhat, Suse and Ubuntu distributions. Where there is a choice,
install both \texttt{r-base} and \texttt{r-base-dev}.
\hypertarget{installing-r-studio}{%
\subsection{Installing R Studio}\label{installing-r-studio}}
Go to \url{https://posit.co/download/rstudio-desktop/}
\hypertarget{install-the-tidyverse-package}{%
\section{\texorpdfstring{Install the \textbf{\texttt{tidyverse}}
package}{Install the tidyverse package}}\label{install-the-tidyverse-package}}
Install \textbf{\texttt{tidyverse}}:
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{install.packages}\NormalTok{(}\StringTok{"tidyverse"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\hypertarget{introduction-to-rstudio}{%
\section{Introduction to RStudio}\label{introduction-to-rstudio}}
In this section we will introduce you to working in RStudio. We will
explain the windows that you see when you first open RStudio and how to
change its appearance to suit you. Then we will see how we use R as a
calculator and how assign values to R objects.
\hypertarget{changing-the-appearance}{%
\subsection{Changing the appearance}\label{changing-the-appearance}}
When you first open RStudio it will display three panes and have a white
background Figure~\ref{fig-rstudio-first-open}
\begin{figure}
{\centering \includegraphics[width=8.33333in,height=\textheight]{images/rstudio-first-open.png}
}
\caption{\label{fig-rstudio-first-open}When you first open RStudio it
will be white with three panes}
\end{figure}
We will talk more about these three panes soon but first, let's get into
character - the character of a programmer! You might have noticed that
people comfortable around computers are often using dark backgrounds. A
dark background reduces eye strain and often makes ``code syntax'' more
obvious making it faster to learn and understand at a glance. Code
syntax is the set of rules that define what the various combinations of
symbols mean. It takes time to learn these rules and you will learn best
by repeated exposure to writing, reading and copying code. You have done
this before when you learned your first spoken language. All languages
have syntax rules governing the order of words and we rarely think about
these consciously, instead relying on what sounds and looks right. And
what sounds and looks right grows out repeated exposure. For example,
35\% of languages, including English, Chinese, Yoruba and Polish use the
Subject-Verb-Object syntax rule:
\begin{itemize}
\tightlist
\item
English: Emma likes R
\item
Chinese: 艾玛喜欢R Emma xǐhuān R
\item
Yoruba: Emma fẹran R
\item
Polish: Emma lubi R
\end{itemize}
and 40\% use Subject-Object-Verb including Turkish and Korean
\begin{itemize}
\tightlist
\item
Turkish: Emma R'yi seviyor
\item
Korean: 엠마는 R을 좋아한다 emmaneun Reul joh-ahanda
\end{itemize}
You learned this rule in your language very early, long before you were
conscious of it, just by being exposed to it frequently. In this book I
try to tell you the syntax rules, but you will learn most from looking
at, and copying code. Because of this, it is well worth tinkering with
the appearance of RStudio to see what Editor theme makes code elements
most obvious to you.
There is a tool bar at the top of RStudio. Choose the \texttt{Tools}
option and then \texttt{Global\ options}. This will open a window where
many options can be changed
Figure~\ref{fig-tools-global-options-appearance}.
\begin{figure}
{\centering \includegraphics[width=8.33333in,height=\textheight]{images/tools-global-options-appearance.png}
}
\caption{\label{fig-tools-global-options-appearance}Tools \textbar{}
Global Options opens a window. One of the options is Appearance}
\end{figure}
Go to the \texttt{Appearance} Options and choose and Editor theme you
like, followed by OK.
The default theme is Textmate. You will notice that all the Editor
themes have syntax highlighting so that keywords, variable names,
operators, etc are coloured but some themes have stronger contrasts than
others. For beginners, I recommend Vibrant Ink, Chaos or Merbivore
rather than Dreamweaver or Gob which have little contrast between some
elements. However, individuals differ so experiment for yourself. I tend
to vary between Solarised light and dark.
You can also turn one Screen Reader Support in the Accessibility Options
in Tools \textbar{} Global Options.
Back to the Panes. You should be looking at three windows: One on the
left and two on the right\footnote{If this is not a fresh install of
RStudio, you might be looking at fours windows, two on the left and
two on the right. That's fine - we will al be using four shortly. For
the time being, you might want to close the ``Script'' window using
the small cross next to ``Untitled1''.}.
The window on the left, labelled Console, is where R commands are
executed. In a moment we will start by typing commands in this window.
Over on the right hand side, at the top, have several tabs, with the
Environment tab showing. This is where all the objects and data that you
create will be listed. Behind the Environment tab is the History and
later you will be able to view this to see a history of all your
commands.
On the bottom right hand side, we have a tab called Plots which is where
your plots will go, a tab called Files which is a file explorer just
like Windows Explorer or Mac Finder, and a Packages tab where you can
see all the packages that are installed. The Packages tab also provides
a way to install additional packages. The Help tab has access to all the
manual pages.
Right, let's start coding!
\hypertarget{your-first-piece-of-code}{%
\subsection{Your first piece of code}\label{your-first-piece-of-code}}
We can use R just like a calculator. Put your cursor after the
\texttt{\textgreater{}} in the Console, type \texttt{3\ +\ 4} and ↵
Enter to send that command:
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{3}\SpecialCharTok{+}\DecValTok{4}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 7
\end{verbatim}
The \texttt{\textgreater{}} is called the ``prompt''. You do not have to
type it, it tells you that R is ready for input.
Where I've written \texttt{3+4}, I have no spaces. However, you
\emph{can} have spaces, and in fact, it's good practice to use spaces
around your operators because it makes your code easier to read. So a
better way of writing this would be:
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{3} \SpecialCharTok{+} \DecValTok{4}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 7
\end{verbatim}
In the output we have the number \texttt{7}, which, obviously, is the
answer. From now on, you should assume commands typed at the console
should be followed by ↵ Enter to send them.
The one in parentheses, \texttt{{[}1{]}}, is an index. It is telling you
that the \texttt{7} is the first element of the output. We can see this
more clear if we create something with more output. For example,
\texttt{50:100} will print the numbers from 50 to 100.
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{50}\SpecialCharTok{:}\DecValTok{100}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
[1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
[20] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
[39] 88 89 90 91 92 93 94 95 96 97 98 99 100
\end{verbatim}
The numbers in the square parentheses at the beginning of the line give
you the index of the first element in the line. R is telling you where
you are in the output.
\hypertarget{assigning-variables}{%
\subsection{Assigning variables}\label{assigning-variables}}
Very often we want to keep input values or output for future use. We do
this with `assignment' An assignment is a statement in programming that
is used to set a value to a variable name. In R, the operator used to do
assignment is \texttt{\textless{}-}. It assigns the value on the
right-hand to the value on the left-hand side.
To assign the value \texttt{3} to \texttt{x} we do:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{x }\OtherTok{\textless{}{-}} \DecValTok{3}
\end{Highlighting}
\end{Shaded}
and ↵ Enter to send that command.
The assignment operator is made of two characters, the
\texttt{\textless{}} and the \texttt{-} and there is a keyboard short
cut: Alt+- (windows) or Option+- (Mac). Using the shortcut means you'll
automatically get spaces. You won't see any output when the command has
been executed because there is no output. However, you will see
\texttt{x} listed under Values in the Environment tab (top right).
Your turn! Assign the value of \texttt{4} to a variable called
\texttt{y}:
\begin{Shaded}
\begin{Highlighting}[]