-
Notifications
You must be signed in to change notification settings - Fork 2
/
WolakAndReid_GenGroupHowTo_SI6.Rmd
1629 lines (1028 loc) · 120 KB
/
WolakAndReid_GenGroupHowTo_SI6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: |
| *Supporting Information to*
| **Accounting for genetic differences among unknown parents in microevolutionary studies: How to include genetic groups in quantitative genetic animal models**
author: Matthew E. Wolak & Jane M. Reid
output:
github_document:
toc: true
toc_depth: 4
---
```{r init_libraries, include = FALSE}
library(MCMCglmm)
library(nadiv)
library(asreml)
```
# Appendix S6: Construction and implementation of Q and A*
Fitting animal models with genetic groups first necessitates formulating the appropriate **Q** and/or <b>A\*</b> matrices. First, we generally describe how **Q** and <b>A\*</b> are formed, with an emphasis on the code used to do this in the `R` package `nadiv` (>v2.14.0; Wolak 2012). This is followed by explanations of minor changes that can be made to <b>A\*</b> to improve animal model convergence and how to deal with warnings from software regarding singularities and non-positive definite matrices. Next are sections with code that illustrate fitting genetic group animal models in the `MCMCglmm` and `asreml` packages in `R`, the **ASReml** standalone program, and finally the **WOMBAT** standalone program. In each of these four sections, code and examples are provided to fit genetic groups using the `nadiv` package in `R` to create the **Q** and <b>A\*</b> matrices that can then be supplied to these animal model fitting programs (in some cases outside of the `R` environment), followed by examples using existing functionality within each software program (see Table S6.1 for a summary of these programs and their capabilities).
The tutorials for these software programs all use the simulated dataset (`ggTutorial` available in `nadiv` or to download in **WOMBAT** and **ASReml** standalone formats) depicted in Figs 2 and 4 of the main text and described below. In *Appendix S4*, the `R` code in the `nadiv` function `simGG()` that is used to simulate the `ggTutorial` dataset is explained. Details of package and software program versions are available at the end of the supporting information.
**Table S6.1.** Overview of animal model software highlighted in the tutorials. The genetic group capabilities row contains a description of the functionality that is available solely within each software program. The **Form Q** and **Form** <b>A\*</b> rows indicate if each program can create the **Q** or <b>A\*</b> matrices, respectively, from a supplied pedigree. We indicate whether fitting genetic groups explicitly (as fixed effects) or implicitly within the random effects structure (as either fixed or random effects) is recommended (+) or not (-) when using each software program.
| | **`MCMCglmm`** | **`ASReml`** in **`R`** | **ASReml** | **WOMBAT** |
|:---------------------------------|:---------------|:------------------------|:--------------------|:-----------|
| **Genetic group capabilities** | None in `MCMCglmm`, but can use <b>A\*</b> or **Q** created with `nadiv` | The `asreml.Ainverse()` function creates <b>A\*</b> | `!GROUPS` qualifier to the pedigree creates <b>A\*</b> and see the `!LAST` and `!GOFFSET` qualifiers (details below) | Random genetic group effects (with separate variance component from the additive genetic variance) by including `GENGROUPS` in a `SPECIAL` block with a supplied **Q** (created with `nadiv`)|
| **Form Q** | No | No | No | No |
| **Form** <b>A\*</b> | No | Yes | Yes | No |
| **Explicit approach** | + | + | + | + |
| **Implicit approach** | | | | |
| *Fixed* | - | + | + | - |
| *Random* | + | + | + | + |
| **Free to use** | Yes | No | No | Yes |
| **Reference** | Hadfield 2010 | Butler et al. 2009 | Gilmour et al. 2014 | Meyer 2007 |
<!--------- ------------ ----------- ------------->
## 6.1 A small example pedigree
Below we demonstrate how the matrices **Q** and <b>A\*</b> are constructed using an example pedigree from Quaas (1988). A detailed version of the Quaas pedigree is available in the `nadiv` package in `R` as the `Q1988` dataset:
<br>
```{r quaas_table, results = 'asis', echo = FALSE}
knitr::kable(Q1988)
```
The table above is essentially just a pedigree (first three columns indicate an individual's unique identifying code, dam, and sire). However, two rows have been added at the top of the pedigree for the unique genetic groups and extra columns are included for ease of using all possible pedigree formats accepted by the `nadiv` functions that implement genetic group methods.
In the dataset, "g1" and "g2" are two genetic groups, lower case letters are phantom parent identities, and upper case letters are observed individual identities. The genetic groups and phantom parents are assigned as in Quaas (1988). Additional details regarding the `Q1988` dataset in `nadiv` are in the `R` help file that can be viewed by running the command `?Q1988` in an `R` session.
Below, we will only deal with observed individuals in this dataset, such that the subset of `Q1988` we will use is:
<br>
```{r quaas_subset}
Q1988sub <- Q1988[-c(3:7), c("id", "damGG", "sireGG")]
```
```{r quaas_subTable, echo = FALSE}
knitr::kable(Q1988sub, row.names = FALSE)
```
Note, this is not the only format for including genetic group information in functions of the `nadiv` package. To see a full description of alternative formats see the function help files, particularly for the `ggcontrib` (to make **Q**) and `makeAinv` (to make <b>A\*</b>) functions.
<!--------- ------------ ----------- ------------->
## 6.2 Constructing Q
As described in the main text, the matrix **Q** contains the fractional contributions from each of *r* genetic groups to each individual's genome, calcluated from the pedigree (see also Robinson 1986; Mrode 2005, ch. 2). Therefore, if there are *n* individuals in the pedigree (e.g., *n*=`r sum(!is.na(Q1988sub[, 2]))` in `Q1988sub`), **Q** will be an *n* row by *r* column matrix. In Henderson's (1976) decomposition of the numerator relationship matrix (**A**=**TDT'**), the lower triangle matrix **T** reflects the contribution of each individual in the pedigree to every other individual. In other words, **T** traces the flow of genes from one generation to another. Therefore, if we add the *r* genetic groups to the top of the pedigree (originally of length *n*) as identities with unknown dam and sire (NA) and fill in genetic group labels as dam and sire identities of every individual with a missing parent (such as how `Q1988sub` is arranged) we can obtain the **T** matrix for a pedigree of length *n*+*r*. The first *r* columns of the **T** matrix contain the fractional contributions from each genetic group. Therefore, **Q** for the *n* individuals in a pedigree is the *r*+1 to *r*+*n* rows and the first *r* columns of **T**.
The `nadiv` function `ggcontrib()` constructs **Q** for a given pedigree with *r* potential genetic group contributions to each of the *n* individuals in the pedigree. For example, **Q** for `Q1988sub` is obtained:
```{r makeQ_quaasSub}
(Q <- ggcontrib(Q1988sub))
```
Each row in **Q** (each entry is a proportional contribution) sums to one and offspring inherit half of each genetic group contribution from each of their parents. Individuals `A` and `C` in `Q1988sub` each have a phantom dam from genetic group "g1" and phantom sire from genetic group "g2". These individuals correspond to the first and third rows of **Q** above. As expected, the two genetic groups (each column in **Q**) contribute equally to the genomes of `A` and `C`.
Individual `B` has observed dam `A` and a phantom sire from genetic group "g2". Row two of **Q** above is a sum of each parents' row (genetic group contributions) divided by two: dam `A` `[0.500 0.500]/2` plus phantom sire "g2" `[0.000 1.000]/2`. In other words, individual `B` gets half of dam `A`'s "g1" group contribution (`0.500/2`) and its phantom sire from "g2" does not contribute to the proportion of `B`'s genome derived from "g1". Similarly, individual `B` gets half of dam `A`'s "g2" group contribution (`0.500/2`) plus half of the phantom sire "g2" contribution (`1.000/2`).
Finally, individual `D` has observed dam `B` and observed sire `C`. Therefore, row four of **Q** above is the sum of each parents' row (genetic group contributions) divided by two: dam `B` `[0.250 0.750]/2` plus sire `C` `[0.500 0.500]/2`.
Implementing an animal model with genetic group effects fitted explicitly as separate fixed regressions requires the **Q** matrix. Because every row of **Q** sums to one, it is not possible to solve an animal model to obtain estimates for every genetic group effect and an overall model intercept. Thus, genetic group effects are not estimable themselves, but can only be estimated when expressed as deviations (or functions) from other group effects (Lynch & Walsh 1998, p.839-841). Therefore, a column/genetic group is chosen as the "reference group" and is assigned a genetic group effect of 0 (see practical implementation of this below). All other group effects will represent deviations from this reference group. For example, we might choose genetic group "g1" in the `Q1988sub` pedigree as our reference group. The estimate of the fixed regression slope from the second column of **Q** equals the "g2" genetic group mean expressed as a deviation from the reference group ("g1") mean of zero. If we had used a pedigree with more than two genetic groups, still setting the first group as the reference, then the genetic group estimates for each group are all deviations from the "g1" reference group mean of zero.
**Q** is also needed to calculate breeding values (**a**) from an animal model that has fitted fixed genetic group effects implicitly within the random effects syntax of the model software. This is because such a model will yield predictions of total additive genetic effects (**u**). From equation (6) in the main text, an individual's breeding value (a<sub>i</sub>) equals the total additive genetic effect (u<sub>i</sub>) minus a weighted sum of genetic group effects contributing to that individual. The weighted sum of genetic group contributions for all individuals can be calculated as the matrix product **Qg**, where **g** is a vector of all genetic group effects and **Q** is calculated above.
<!--------- ------------ ----------- ------------->
## 6.3 Constructing A*
A major advance leading to the widespread use of animal models comes from Henderson (1976) and subsequent researchers' (e.g., Quaas 1976, 1995; Meuwissen and Luo 1992) work to directly construct the matrix inverse of the additive genetic relatedness matrix (<b>A<sup>-1</sup></b>). Direct methods for obtaining <b>A<sup>-1</sup></b> are necessary, because <b>A<sup>-1</sup></b> and not **A** is used when solving the equations in the animal model. <b>A<sup>-1</sup></b> has a unique structure that can be formed by tracing the flow of alleles from each individual to its offspring. Thus methods to directly construct <b>A<sup>-1</sup></b> rely on the basic premise that the flow of alleles through a pedigree can be traced because each individual inherits, on average, half of its alleles from each parent with some sampling variation that can be modeled based on the Mendelian sampling variance in the population.
To estimate genetic group effects by implicitly fitting the genetic groups within the random effect syntax of the model software, requires an inverse matrix to model the covariance among individual total additive genetic values and group effects based on alleles shared identical by descent. Conveniently, the matrix inverse accounting for individual-individual, group-group, group-individual, and individual-group sharing of alleles can be constructed directly from the pedigree in a very similar manner to <b>A<sup>-1</sup></b> (see also *Appendix S5*; Westell, Quaas & Van Vleck 1988; Quaas 1988). This matrix, <b>A\*</b> (following notation of Quaas 1988), is a compilation of four sub-matrices that are each a mathematical function of **Q**, <b>A<sup>-1</sup></b>, or both (Fig. 3e in main text).
The <b>A\*</b> for any pedigree with genetic groups can be obtained using the `makeAinv()` function in the `nadiv` package. For example, <b>A\*</b> for the `Q1988sub` pedigree is:
```{r makeAstar_quaasSub}
AstarOut <- makeAinv(Q1988sub, ggroups = 2)
(Astar <- AstarOut$Ainv)
```
There are several points to highlight regarding <b>A\*</b>. First, this matrix can be split into four blocks: (1) the upper-left 4-by-4 (*n*x*n*) block (individual-individual covariance structure), (2) the lower-left 2-by-4 (*r*x*n*) block (group-individual covariance structure), (3) the upper-right 4-by-2 (*n*x*r*) block (individual-group covariance structure), and (4) the lower-right 2-by-2 (*r*x*r*) block (group-group covariance structure). Note, the upper-left block of <b>A\*</b> is the same as the <b>A<sup>-1</sup></b> for the `Q1988` pedigree without implicitly including genetic group effects (see also Fig. 3e in main text). For example, <b>A<sup>-1</sup></b> for the Q1988 pedigree without genetic groups can be obtained and compared to <b>A\*</b> above using the same `nadiv` function without invoking the genetic group algorithm [`makeAinv(..., ggroups = NULL, ...)`, the default value of the `ggroups` argument]:
```{r makeAinv_quaasSub}
makeAinv(Q1988[-c(1:7), c("id", "dam", "sire")])$Ainv
```
<b>A\*</b> constructed in `nadiv` can be used in various animal model software programs (Table S6.1) to implicitly include genetic group effects within the random effects structure of the model. Although the details vary among software programs (see tutorials in the subsections to *Appendix S6.4* for detailed instructions in each of the highlighted software programs), the general approach is to associate the inverse covariance matrix <b>A\*</b> with the term in the model representing additive genetic effects; almost exactly the same way as the typical <b>A<sup>-1</sup></b> is associated with additive genetic effects in a basic animal model. However, some potential alterations to <b>A\*</b> should be considered before and during the model fitting process.
<!--------- ------------ ----------- ------------->
### 6.3.1 Genetic groups on top or bottom of A*?
The entries in <b>A\*</b> corresponding to genetic groups can be placed in the last *r* rows and columns (as above) or the first *r* rows and columns. When setting up <b>A\*</b>, switching between these placements is trivial. The consequences for using <b>A\*</b>, however, can be rather large. Solving an animal model requires solving a large set of equations (the Mixed Model Equations or MME). Often, some of these equations may not be unique or in other words two equations may be exactly the same, very similar, or have a linear dependency (such that knowing the solution to one equation means the solution to the other equation is also known). This is a cause of singularities in the MME coefficient matrix of an animal model.
The MME are solved by either moving from the first set of equations to the last set ('top-down') or from the last set to the first set ('bottom-up'). In either case, if two equations are nearly the same then the model will obtain a solution for the first of the pair, but then encounter a problem with the second. Some software packages might automatically deal with this second, non-unique equation. For example, **ASReml** solves the equations from the bottom up and will delete the second equation (equation on top) to continue to solve the MME (Butler et al. 2009, pp. 134-135; Gilmour et al. 2014, pp. 106-107).
Unfortunately, the algorithms to solve the MME are not this simple in practice. Even though **ASReml** states which way the equations are solved, some options allow the program to re-order the equations to simplify solving the model. This re-ordering is based on the contents of each equation without respect to its categorization in the model. In other words, one genetic group equation might be re-ordered so that it is placed next to an equation for a year random effect level while another genetic group equation in the same model may be placed elsewhere with a random effect of individual additive genetic value. Although one cannot be completely certain where the genetic group equations will end up when the software program optimizes the order of equations to most efficiently solve the model, the ability to easily switch where in <b>A\*</b> the genetic groups are positioned has been a strategy that has sometimes worked well in our own personal experience. Switching the placement of the genetic groups from the top to the bottom (or vice versa) of <b>A\*</b> has sometimes solved convergence issues.
However, the standalone version of **ASReml** has the `!LAST` job qualifier (see Table S6.1 and the **ASReml** tutorial in *Appendix S6.4.4*) to consistently re-order the MME such that the genetic groups equations are solved last. This option assumes that the genetic groups are the first *r* rows and columns of <b>A\*</b>, where *r* is the number of levels following the `!LAST` qualifier, and therefore 'on top'. We are not aware of a similar argument that can be implemented for a model using `asreml` in `R`.
**WOMBAT**, also, has several MME ordering strategies that can be implemented. However, these re-order the entire set of equations and the order of certain levels of random effects cannot be specified (Meyer 2007, section 5.3.1). `MCMCglmm` re-orders the MME according to algorithms associated with the underlying `CSparse` functions (Davis 2006; Hadfield 2010) and also cannot designate certain levels of random effects to a certain location in the re-ordered equations. Therefore, it is not clear how specifying genetic groups at the top or bottom of <b>A\*</b> will affect the convergence of any particular model.
The `nadiv` function `makeAinv()`, enables the user to place genetic groups at the top or bottom of <b>A\*</b>, in an attempt to solve genetic group equations first or last. Therefore, if any singularities in the MME coefficient matrix occur then the user can attempt to re-order the genetic group equations in the hope that this will lead to model convergence. The `gOnTop` argument to the `makeAinv()` function switches between placing the genetic groups on top or on bottom of <b>A\*</b>, with the default to place genetic groups at the bottom of <b>A\*</b> [`makeAinv(..., gOnTop = FALSE, ...)`].
<!--------- ------------ ----------- ------------->
### 6.3.2 Removing singularities and other problems within A*
Fitting genetic group effects implicitly often causes singularities in the coefficient matrix of the MME, that can sometimes be overcome by slight changes to the strategy for grouping (Schaeffer 1991). Software programs may report this issue as a non-positive definite generalized inverse matrix (often abbreviated GIV, GIN, or ginverse), or alternatively stating that a GIV/GIN/ginverse is negative definite, non-positive semi-definite, or ill-conditioned. For example, **ASReml** often deals with these types of singularities according to its own strategies (Butler et al. 2009; Gilmour et al. 2014). If genetic groups are specified in an **ASReml** analysis and a singularity occurs the program will introduce a row to the matrix to overcome this (see *Lagrangian multipliers* in Gilmour et al. 2014, pp. 164-165). Regardless of which software program is used, however, adding values to some of the matrix elements may also remove singularities occurring within the MME in addition to the aforementioned strategy of ordering <b>A\*</b> with genetic groups either on top or at the bottom.
Adding a small number to some diagonals can remove singularities (Schaeffer 1994, 1999; Oikawa and Yasuda 2009; Gilmour 2010). Schaeffer (1994, 1999) advocates adding an identity matrix to the *r*x*r* group-group block of <b>A\*</b> such that all of the diagonals (d<sub>ii</sub>) become 1+d<sub>ii</sub>. This addition to <b>A\*</b> means that the genetic group effects are no longer, conceptually speaking, fixed effects but are random effects with expectations of zero and a variance describing their distribution (see *Appendices S1 & S5*). Further, predicted genetic group effects will be biased toward the expected value of zero when genetic groups are random effects, just like any other random effects, with the amount of bias depending on how much information the data provide to predict genetic group effects (Hadfield et al. 2010). This bias will extend to any functions involving the predicted genetic group effects, specifically the predicted total additive genetic effects (**u**). Although, in many cases the mixed model equations and subsequent variance component estimates change only slightly, thus basic interpretations of results will remain the same (but see cautionary note below; further discussion in Schaeffer 1994). The strategy of adding an identity matrix is desirable if there are many individuals assigned to each group, because it does not re-rank the predicted breeding values and can improve convergence (Schaeffer 1994). The main change to the model when the leading diagonal elements of the group-group portion of <b>A\*</b> are altered is a (co)variance matrix for traits within groups is added to the model equations dealing with the genetic group effects. In the simple case of a single trait, this variance is assumed to be the same across group effects with no covariances between groups (i.e., groups effects are normally distributed with a variance of **I** sigma<sup>2</sup><sub>g</sub>). In practice this variance is often assumed to equal the additive genetic variance for the individual total additive genetic effects (Sullivan 1999, see also **WOMBAT**'s genetic groups in *Appendix S6.4.5.4*).
A further interpretation of this assumption means that a model's estimate of additive genetic variance will also include the magnitudes of the genetic group effects. This may cause problems if the variance in the genetic group effects themselves is much greater than the true additive genetic variance. In such a case, adding a smaller value than 1 to the diagonals (d<sub>ii</sub>) of <b>A\*</b> structures the model to reflect this difference. Using values other than 1 for the addition to the diagonal elements effectively means that the ratio of the variance among genetic group effects to the additive genetic variance is no longer constrained to be 1:1. Thus, when modelling genetic groups as random effects we strongly caution users to carefully consider the sensitivity of the additive genetic variance estimate to the choice of value added to the group-group diagonal elements of <b>A\*</b>. Multiple models fitted with different values added to the diagonal elements are recommended to ensure an unbiased estimate of the additive genetic variance. Schaeffer (1994) provides a detailed discussion and interpretation of different strategies for altering <b>A\*</b>.
Note that **WOMBAT** fits a separate variance component to the genetic group effects (see *Appendix S6.4.5.4*), thus offering a direct way to check that the variance in genetic group effects can be assumed to be the same as the additive genetic variance. **ASReml** provides the `!GOFFSET` pedigree file qualifier to add a value to the group-group diagonal elements when using **ASReml**'s `!GROUPS` pedigree qualifier (see *Appendix S6.4.4.4*). Otherwise, this alteration of <b>A\*</b> is easily accomplished manually after creating <b>A\*</b> with the `makeAinv()` function in the `nadiv` package. Altering the <b>A\*</b> for the `Q1988sub` pedigree above (with genetic groups on the bottom) would proceed as follows and result in the matrix displayed below:
```{r Astar_plus_I}
n <- 4
r <- 2
(ggIdentity <- bdiag(Diagonal(n = n, x = 0), Diagonal(n = r, x = 1)))
(AstarI <- Astar + ggIdentity)
```
Caution is necessary, however, when dealing with singularities in an animal model. Specifically, model comparison may not be possible if singularities occur. **ASReml** alters the order of equations to remove singularities. Consequently, different terms can cause singularities between runs of the model, where the model may have been slightly modified for the purposes of testing the significance of variance components or computing profile likelihoods of variance components. Singularities caused by different terms will affect the log-likelihood of the model and therefore procedures using likelihood ratio test statistics may no longer be valid (Gilmour 2005; Butler et al. 2009; Gilmour et al. 2014)
<!------------------------------------------------------------->
<!--------- ------------ ----------- ------------->
## 6.4 How to fit genetic groups in MCMCglmm, ASReml-R, ASReml-standalone, & WOMBAT
```{r knit_resetwd, include = FALSE}
#FIXME knitr::opts_knit$set(root.dir = "~/Dropbox/AberdeenPostdoc/GeneticGroups/GGreview_MS/")
```
<!--------- ------------ ----------- ------------->
### 6.4.1 Tutorial dataset `ggTutorial`
We provide a step-by-step demonstration of how to fit genetic groups in animal models, by modelling genetic group effects either explicitly as separate fixed regressions or implicitly within the random effects, using the different animal model software programs. All demonstrations use the dataset `ggTutorial`, which is available in the `nadiv` package for working in `R`, as supplementary file `ggTutorial.dat` for working in the standalone **ASReml** program, and as supplementary file `ggTutorial.d` for working in the **WOMBAT** program. This dataset was simulated using the `nadiv` package (details in *Appendix S4*) and are the same data as those depicted in Figs 2 and 4 from the main text. The dataset contains `r nrow(ggTutorial)` individuals across `r max(ggTutorial$gen)` generations. The simulation specifies a carrying capacity of 400 individuals per generation, 200 mating pairs per generation, and 40 immigrants per generation. The expected difference between founder and immigrant mean breeding values equals `3` and both the environmental and additive genetic variances for both the founder and immigrant groups equal `1`. The data are loaded into `R` by loading the `nadiv` package.
```{r librarynadiv, eval = FALSE}
library(nadiv)
```
The first three columns of `ggTutorial` contain the complete pedigree and all `r nrow(ggTutorial)` individuals have a phenotypic record in `p`.
```{r dataset_view}
head(ggTutorial)
str(ggTutorial)
```
The rest of the columns contain:
* `parAvgU` is the average *u* (total additive genetic effect) of each individual's parents
* `mendel` is the Mendelian sampling deviation from the mid-parent value that is unique to each individual
* `u` is the total additive genetic effect of each individual's genotype
* `r` is the residual deviation of each individual
* `p` is the phenotype for each individual (population expected mean is 20)
* `is` indicates the immigration status, where focal population residents have `is == 0` and immigrants have `is == 1`
* `gen` is the generation in which each individual was born
The supplementary files for use in **ASReml** and **WOMBAT** contain a subset of these variables plus three derived variables. The **ASReml** and **WOMBAT** data files contain continuous variables for the coefficient of inbreeding (*f*) and two genetic group genomic contributions (from **Q**). For demonstration purposes, these are added to the basic `ggTutorial` below within each section. Note the **WOMBAT** pedigree and data files have had 10000 added to the individual identity codes to comply with **WOMBAT** data structure requirements.
It is necessary to fit a fixed regression on the coefficient of inbreeding (*f*) to account for inbreeding depression and minimize bias in estimates of quantitative genetic parameters (Kennedy, Schaeffer & Sorensen 1988; Reid & Keller 2010; Wolak & Keller 2014). This is not directly part of the process to fit genetic groups in the animal model, but it is considered necessary when fitting animal models to population data in which inbreeding occurs.
-----------------------------------------------------
<!------------------------------------------------------------->
<!--------- ------------ ----------- ------------->
### 6.4.2 `MCMCglmm`
Modelling genetic groups either explicitly as separate fixed regressions or implicitly within the random effects syntax of `MCMCglmm` requires either **Q** or <b>A\*</b> to be created using the `nadiv` package. Below we demonstrate how to create these matrices and fit each in a `MCMCglmm` model.
```{r mcmc_packages, eval = FALSE}
library(MCMCglmm)
library(nadiv)
```
To complete the basic preparations of the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (*f*; to minimize bias from inbreeding depression, see the last paragraph of *Appendix 6.4.1* above). This calculation can be done using the `inverseA()` function of `MCMCglmm`. Consequently, the <b>A<sup>-1</sup></b> matrix is also created, which we will assign as its own object in `R`, because it will be needed in the model fitting genetic group effects explicitly as fixed covariate regressions with **Q** (see *Appendix 6.4.2.2*).
```{r mcmc_makeAinvF}
ainvOut <- inverseA(ggTutorial[, 1:3])
Ainv <- ainvOut$Ainv
ggTutorial$f <- ainvOut$inbreeding
```
`MCMCglmm` facilitates modelling of non-Gaussian response variables, which has greatly extended the application of animal models in evolutionary ecology to a range of non-Gaussian phenotypes. Fitting genetic groups is no different when modelling non-Gaussian response variables. In such models, the genetic group effects (fitted either explicitly as separate fixed regressions or implicitly within the random effects) are estimated on the underlying latent scale.
#### 6.4.2.1 Preparing a pedigree with genetic groups
Here, we provide general instructions to prepare pedigrees. Calculating the matrix of genetic group contributions (**Q**) and the augmented inverse relatedness matrix (<b>A\*</b>) requires a pedigree that indicates groups instead of missing values for parents. In the `ggTutorial` data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (`foc0`). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (`g1`). Alternatively, this second group could be further divided into genetic groups based on generation. However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which the immigrant was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (`ggPed`).
```{r mcmc_makePed}
ggPed <- ggTutorial[, c("id", "dam", "sire", "is", "gen")]
naPar <- which(is.na(ggPed[, 2]))
ggPed$GG <- rep("NA", nrow(ggPed))
# 'focal' genetic group = "foc0" and 'immigrant' = "g1"
# obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively
ggPed$GG[naPar] <- as.character(ggPed$is[naPar])
ggPed$GG[ggPed$GG == "0"] <- paste0("foc", ggPed$GG[ggPed$GG == "0"])
ggPed$GG[ggPed$GG == "1"] <- paste0("g", ggPed$GG[ggPed$GG == "1"])
ggPed[naPar, 2:3] <- ggPed[naPar, "GG"]
```
Note, the approach and format of the pedigree above is different from the `Q1988sub` pedigree earlier - mostly because the format here makes it easier to specify genetic groups in a pedigree for this particular case, but partly to illustrate the flexibility of `nadiv` functions. To be more specific, the `Q1988sub` pedigree contained two extra rows for the two genetic groups. The `ggroups` argument to `makeAinv()` supplied an integer indicating how many rows at the begining of the pedigree contained genetic groups and not individuals. However, in the `ggPed` above no extra rows are added to the pedigree, but in the next few sections the character vector given to the `ggroups` argument in both `ggcontrib()` and `makeAinv()` specifies the name of the unique genetic groups that have been filled in for individuals instead of missing dam and sire identities.
<!--------- ------------ ----------- ------------->
#### 6.4.2.2 Fixed explicit genetic group effects with Q (from `nadiv`)
Fitting genetic group effects explicitly requires the columns of **Q** to be included as separate fixed covariate regressions. The code below creates **Q** for the `ggPed` pedigree and adds the columns (`foc0` and `g1`) as variables in `ggTutorial` so that they can be included in a model.
```{r mcmc_makeQ}
Q <- ggcontrib(ggPed[, 1:3], ggroups = c("foc0", "g1"))
ggTutorial <- cbind(ggTutorial, Q)
str(ggTutorial)
```
An important point to make is that, because the pedigree and data for `ggTutorial` are in the same dataframe, the genetic group coefficients from **Q** can be added directly to the data with `ggTutorial <- cbind(ggTutorial, Q)` above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the **Q** matrix contains a row for every individual in the pedigree, only the subset of rows in **Q** that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
The fixed regressions on `foc0` and `g1` can be combined with the standard <b>A<sup>-1</sup></b> to specify an animal model in `MCMCglmm`. Below, we write out the specification of the `MCMCglmm` default priors for the fixed effects in the model. We recommend a normal prior distribution for each genetic group fixed regression coefficient with a mean of 0 and a very large variance. Our fixed effect prior specification below has an element in `mu` and `V` of `prPE$B` to correspond with the model's: intercept, regression on the coefficient of inbreeding (*f*), and regressions on the genetic group contributions (`foc0` and `g1`). Because unique solutions to the model do not exist (see discussion of estimability in *Appendix S6.2*) for the intercept, coefficient of inbreeding (*f*), and genetic group effects (`foc0` and `g1`), we fit the `foc0` genetic group as the last fixed effect. This ensures that the model sets `foc0` as a reference group and estimates the `g0` genetic group effect as a deviation from the mean breeding value in the `foc0` group. The model is written out below, and a saved version is available on Dryad (Wolak & Reid 2016) as supporting information file "./MCMCglmm/ggRegMC.RData" so that it is not necessary to re-run this model to examine the results.
```{r mcmc_ggReg, eval = FALSE}
# Use diffuse normal priors (MCMCglmm defaults) for all fixed effects
# Parameter expanded prior on additive genetic variance
## Scaled F-dsitribution with numerator and denominator df=1
prPEexp <- list(B = list(mu = rep(0, 3), V = diag(3)*1e10),
R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 1000)))
# NOTE: the saved model could be loaded instead of running it below
## load(file = "./MCMCglmm/ggRegMC.RData")
ggRegMC <- MCMCglmm(fixed = p ~ f + g1 + foc0,
random = ~ id,
ginverse = list(id = Ainv),
data = ggTutorial,
prior = prPEexp,
singular.ok = FALSE,
pr = TRUE,
nitt = 28000, burn = 3000, thin = 25)
```
```{r mcmc_load_ggReg, include = FALSE}
load(file = "./MCMCglmm/ggRegMC.RData")
```
The posterior distributions of genetic group effects are included in the `Sol` object along with the posterior distributions of the other fixed effects. Note the software has dropped the `foc0` group estimate in this model, implying this is the reference group and the `g1` genetic group effects is the estimated deviation from the reference. To report the posterior modes and 95% limits of the highest posterior density:
```{r mcmc_post_ggReg}
with(ggRegMC, cbind(postMode = posterior.mode(Sol[, 1:3]),
HPDinterval(Sol[, 1:3])))
```
The above posterior mode estimate of `r round(posterior.mode(ggRegMC$Sol[, "g1"]), 3)` agrees with the expected difference of `3` between the mean breeding values of the immigrant group (`3`) and the founder population (`0`), specified in the simulation.
The posterior distribution of the breeding values (**a**) can be accessed from `ggRegMC$Sol[, -c(1:3)]`. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the posterior distribution for the total additive genetic effects (**u**) can be calculated as (eqn 5 in main text):
```{r mcmc_calcU}
# Add 0 genetic group effect for the reference group 'foc0' to the 'g1' samples
regPost_gHat <- rbind(0, ggRegMC$Sol[, "g1"])
# For the ith mcmc iteration that was saved, calculate 'u'
regPost_u <- as.mcmc(t(sapply(seq(nrow(ggRegMC$Sol)),
FUN = function(i){Q %*% regPost_gHat[, i] + ggRegMC$Sol[i, -c(1:3)]})))
```
<!--------- ------------ ----------- ------------->
#### 6.4.2.3 Random implicit genetic group effects with A* (from `nadiv`)
Fitting genetic group effects implicitly within the random effects portion of the `MCMCglmm` model specification requires the augmented inverse **A** matrix (<b>A\*</b>) to be supplied as a sparse matrix in the `ginverse` argument. The code below creates <b>A\*</b> for the `ggPed` pedigree using the `makeAinv` function in the `nadiv` package.
```{r mcmc_makeAstar_hidden, include = FALSE}
Astar <- makeAinv(ggPed[, 1:3], ggroups = c("foc0", "g1"), gOnTop = TRUE)$Ainv
```
```{r mcmc_makeAstar_noneval, eval = FALSE}
Astar <- makeAinv(ggPed[, 1:3], ggroups = c("foc0", "g1"), gOnTop = TRUE)$Ainv
```
In the above code, we directed `makeAinv()` to include the genetic groups on top (top-left block) of <b>A\*</b> using the `gOnTop = TRUE` argument. At this point we could try fitting the model with this version of <b>A\*</b> (i.e., `Astar`). However, the model returns an error that `G-structure 1 is ill-conditioned`. In other words, <b>A\*</b> has at least one eigenvalue that is either negative or zero. Re-defining the groupings can eliminate such a singularity. However, removing the singularity in the matrix while maintaining the current groupings is accomplished by adding a small value to the diagonal elements of the group-group portion of the matrix. Note that such an addition changes the interpretation of the effects in the model, as discussed above (*Appendix 6.3.2* as well as *Appendix S5*), now implying that the genetic group effects are random effects in the model. Below, we add 0.1 to the diagonal element of the <b>A\*</b> which will constrain the variance in genetic group effects to be one tenth the additive genetic variance estimated by the model. A value of 0.1 may bias estimates of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.
```{r mcmc_Astar_plusI}
AstarAdd <- Astar
(ggrows <- match(c("foc0", "g1"), dimnames(AstarAdd)[[1]]))
diag(AstarAdd)[ggrows] <- diag(AstarAdd)[ggrows] + 0.1
Astar[1:3, 1:3]
AstarAdd[1:3, 1:3]
```
The `AstarAdd` sparse inverse matrix can now be associated with the individual identities (`id`) by supplying it as an argument to `ginverse` in the `MCMCglmm` model. We specify the same default prior distribution for the fixed effects below (i.e., the same ones as we wrote out in *Appendix 6.4.2.2*). The prior distribution for the `id` term in the model is the prior used for the additive genetic variance and, in this case, also the same prior for the variance of the genetic group effects. It is unclear how the prior specified for the additive genetic variance will affect the estimate of the genetic group effects or even what prior is to be used in this context. As usual with Bayesian analyses, we recommend priors that are framed within the context of the data at hand, the model being fitted, and any *a priori* belief in the model parameters. Further we suggest testing the sensitivity of any particular prior used for a given model and especially when genetic group effects are treated as random effects by the model.
The model is written out below, but a saved version is available on Dryad (Wolak & Reid 2016) as supporting information file "./MCMCglmm/ggAstarRandMC.RData" so that it is not necessary to re-run this model to examine the results.
```{r mcmc_ggAstar, eval = FALSE}
# Parameter expanded prior on additive genetic variance
prPEimp <- list(B = list(mu = rep(0, 2), V = diag(2)*1e10),
R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 1000)))
# NOTE: the saved model could be loaded instead of running it below
## load(file = "./MCMCglmm/ggAstarRandMC.RData")
ggAstarRandMC <- MCMCglmm(fixed = p ~ f,
random = ~ id,
ginverse = list(id = AstarAdd),
data = ggTutorial,
prior = prPEimp,
pr = TRUE,
nitt = 53000, burn = 3000, thin = 50)
```
```{r mcmc_load_ggAstar, include = FALSE}
load(file = "./MCMCglmm/ggAstarRandMC.RData")
```
Alternatively, the non-altered matrix `Astar` could be supplied to the `ginverse` argument of `MCMCglmm` to see the types of warnings that led us to fit the model with the matrix `AstarAdd`.
The posterior distributions of genetic group effects are included in the `Sol` object (as long as `pr = TRUE` in the model statement) along with the posterior distributions of the fixed effects. To report the posterior modes and 95% limits of the highest posterior density:
```{r mcmc_post_ggAstar}
with(ggAstarRandMC, cbind(postMode = posterior.mode(Sol[, 1:4]),
HPDinterval(Sol[, 1:4])))
```
Note that all fixed effects in this model are estimable and the model returns these estimates along with predicting both of the genetic group effects. The genetic group predictions, themselves, are random effects and thus are deviations from their expected value of 0. The posterior mode of differences between genetic group predictions `r round(posterior.mode(with(ggAstarRandMC, Sol[, 4] - Sol[, 3])), 3)` (95% HPD interval:`r paste0(round(HPDinterval(with(ggAstarRandMC, Sol[, 4] - Sol[, 3])), 3), collapse = ", ")`) agrees with the expected difference between the mean breeding values of the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The posterior distribution of the total additive genetic effects (**u**) can be accessed from `ggAstarRandMC$Sol[, -c(1:4)]`. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the posterior distribution for the breeding values (**a**) can be calculated as (eqn 6 in main text):
```{r mcmc_calcA}
AstarPost_gHat <- ggAstarRandMC$Sol[, 3:4]
# For the ith mcmc iteration that was saved, calculate 'a'
# Assuming Q has already been calculated as above
AstarPost_a <- as.mcmc(t(sapply(seq(nrow(ggAstarRandMC$Sol)),
FUN = function(i){ggAstarRandMC$Sol[i, -c(1:4)] -
Q %*% matrix(AstarPost_gHat[i, ], ncol = 1)})))
```
-----------------------------------------------
<!------------------------------------------------------------->
<!--------- ------------ ----------- ------------->
### 6.4.3 `ASReml` in `R`
To model genetic group effects explicitly as separate fixed regressions in an animal model using the `R` package `asreml` requires **Q** to be created using the `nadiv` package and then included in the `asreml()` model function. Including genetic group effects implicitly within the random effects of an animal model in `asreml()` can be done either using <b>A\*</b> constructed with `nadiv` or with `asreml::asreml.Ainverse()`. All three of these approaches will be demonstrated below using the simulated dataset `ggTutorial`:
```{r asremlr_packages, eval = FALSE}
library(asreml)
library(nadiv)
```
To complete the basic preparations of the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (*f*; to minimize bias from inbreeding depression, see the last paragraph of *Appendix 6.4.1*). This calculation can be done using the `asreml.Ainverse()` function of `asreml`. Consequently, the <b>A<sup>-1</sup></b> matrix is also created, which we will assign as its own `R` object, because it will be needed in the model fitting genetic group effects explicitly as separate fixed covariate regressions with **Q** below (*Appendix 6.4.3.2*).
```{r asremlr_makeAinvF}
ainvOut <- asreml.Ainverse(ggTutorial[, 1:3])
listAinv <- ainvOut$ginv
ggTutorial$f <- ainvOut$inbreeding
## Have to also convert the 'id' variable to a factor for models below
ggTutorial$id <- as.factor(ggTutorial$id)
```
#### 6.4.3.1 Preparing a pedigree with genetic groups
Calculating the matrix of genetic group contributions (**Q**) and the augmented inverse relatedness matrix (<b>A\*</b>) requires a pedigree that has genetic groups indicated instead of missing values for parents. In the `ggTutorial` data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (`foc0`). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (`g1`). Alternatively, this second group could be further divided into genetic groups based on generation. However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which it was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (`ggPed`).
```{r asremlr_makePed}
ggPed <- ggTutorial[, c("id", "dam", "sire", "is", "gen")]
naPar <- which(is.na(ggPed[, 2]))
ggPed$GG <- rep("NA", nrow(ggPed))
# 'focal' genetic group = "foc0" and 'immigrant' = "g1"
# obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively
ggPed$GG[naPar] <- as.character(ggPed$is[naPar])
ggPed$GG[ggPed$GG == "0"] <- paste0("foc", ggPed$GG[ggPed$GG == "0"])
ggPed$GG[ggPed$GG == "1"] <- paste0("g", ggPed$GG[ggPed$GG == "1"])
ggPed[naPar, 2:3] <- ggPed[naPar, "GG"]
# Two rows need to be added for the genetic groups
## Genetic groups will be given the missing value NA instead of parent identities
ggPed <- data.frame(id = c("foc0", "g1", as.character(ggPed$id)),
dam = c(NA, NA, as.character(ggPed$dam)),
sire = c(NA, NA, as.character(ggPed$sire)))
head(ggPed)
```
Note that the `nadiv` function `makeAinv()` used to construct <b>A\*</b> is more flexible than `asreml`'s `asreml.Ainverse()` in the way pedigrees containing genetic groups can be formatted (for more detail on `makeAinv()` formats, see the help file by running the command `?makeAinv` in an `R` session - particularly the examples at the end of the file - or the genetic group pedigree used above in the `MCMCglmm` tutorial). To be more specific, the format required by `asreml.Ainverse()` is similar to the `Q1988sub` pedigree used earlier in the tutorial (*Appendix 6.1*) which contains two extra rows for the two genetic groups.
#### 6.4.3.2 Fixed explicit genetic group effects with Q (from `nadiv`)
Fitting genetic group effects explicitly in `asreml` requires the columns of **Q** to be included as separate fixed covariate regressions. The code below creates **Q** for the `ggPed` pedigree and adds the columns (`foc0` and `g1`) as variables in `ggTutorial` so that they can be included in a model.
```{r asremlr_hidden_makeQ, include = FALSE}
Q <- ggcontrib(ggPed)
```
```{r asremlr_silent_makeQ, eval = FALSE}
Q <- ggcontrib(ggPed)
ggTutorial <- cbind(ggTutorial, Q)
```
```{r asremlr_addQ}
str(ggTutorial)
```
An important point to make is that, because the pedigree and data for `ggTutorial` have essentially the same sizes and structures (only `ggPed` now has two extra rows for the genetic groups - but these will be dropped automatically from **Q** by `ggcontrib()`), the genetic group coefficients from **Q** can be added directly to the data with `ggTutorial <- cbind(ggTutorial, Q)` above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the **Q** matrix contains a row for every individual in the pedigree, only the subset of rows in **Q** that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
The fixed regressions on `foc0` and `g1` can be combined with the standard <b>A<sup>-1</sup></b> to specify an animal model in `asreml()`. Because unique solutions to the model do not exist (see discussion of estimability in *Appendix S6.2*) for the intercept, coefficient of inbreeding (*f*), and genetic group effects (`foc0` and `g1`), we fit the `foc0` genetic group as the last fixed effect. This ensures that the model sets `foc0` as a reference group and estimates the `g0` genetic group effect as a deviation from the mean breeding value in the `foc0` group.
```{r asremlr_noneval_ggReg, eval = FALSE}
ggRegASR <- asreml(fixed = p ~ f + g1 + foc0,
random = ~ ped(id),
ginverse = list(id = listAinv),
data = ggTutorial)
```
```{r asremlr_load_ggReg, include = FALSE}
load(file = "./ASRemlR/ggRegASR")
```
The estimates of the genetic group effects are included with the fixed effect estimates. Note the software has set the `foc0` group estimate to zero in this model, implying this is the reference group and the `g1` genetic group effect is the estimated deviation. The variance of the estimated genetic group effect (`g1`) is included with the variances of the fixed effect estimates, which are reported on `asreml`'s underlying transformed scale. To re-scale the variances of the fixed effects back to the phenotypic scale and report them as standard errors:
```{r asremlr_estGG}
with(ggRegASR, cbind(Effect = coefficients$fixed,
seEffect = sqrt(vcoeff$fixed * sigma2)))
```
The above estimate of `r round(ggRegASR$coefficients$fixed[[2]], 3)` agrees with the expected difference between mean breeding value in the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The predicted breeding values (**a**) can be accessed from `ggRegASR$coefficients$random`. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the total additive genetic effects (**u**) can be calculated without any further manipulation as (eqn 5 in main text):
```{r asremlr_calcU}
reg_gHat <- matrix(ggRegASR$coefficients$fixed[1:2], ncol = 1)
reg_u <- Q %*% reg_gHat + ggRegASR$coefficients$random
```
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the breeding values (**a**). The `R` function `match()` is particularly helpful in these cases.
#### 6.4.3.3 Fixed implicit genetic group effects with A* (from `nadiv` or `asreml`)
Fitting fixed effects of genetic groups implicity within the random effects of an animal model requires the augmented inverse relatendess matrix (<b>A\*</b>) to be supplied as a list in the `ginverse` argument. First, we create the sparse matrix of <b>A\*</b> for the `ggPed` pedigree using the `makeAinv` function in the `nadiv` package and then demonstrate how to obtain this same matrix with the `asreml` package.
```{r asremlr_hidden_makeAstar, include = FALSE}
listAstar <- makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$listAinv
```
```{r asremlr_noneval_makeAstar, eval = FALSE}
listAstar <- makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$listAinv
```
In the above code, we directed `makeAinv()` to include the genetic groups on bottom (bottom-right block) of <b>A\*</b> using `gOnTop = FALSE`. This was done to improve model convergence (see *Appendix 6.3* and its subsections). In this case, the `asreml` model converges. In other pedigrees, models, and/or genetic group classifications, if singularities occur then sometimes this can be overcome by adding one (or a small value) to the diagonal elements of the group-group portion of the matrix (although see discussion of this in *Appendix S5* and cautions/considerations in *Appendix 6.3.2*). The alteration is demonstrated below along with cautions and interpretations (*Appendix 6.4.3.4*). Note, the easiest approach is to adjust the genetic group diagonals of <b>A\*</b> using the matrix returned by `makeAinv()` (i.e., object `Ainv` in the list), then convert the matrix to the list format required by `asreml`.
Alternatively, we can use `asreml`'s function `asreml.Ainverse()` to obtain a list format of <b>A\*</b>.
```{r asremlr_asremlAstar, eval = FALSE}
listAstar <- asreml.Ainverse(ggPed, groups = 2)$ginv
```
CAUTION: under the current and previous versions of `asreml` (current version at the end of Supporting Information), quite often `asreml.Ainverse()` simply does not work. It will return an object with a `ginv` entry that is a `data.frame`, but with zero rows. Sometimes, re-running the above command will work - otherwise use `nadiv`'s `makeAinv()` (*Appendix 6.4.3*)! Note that in the above code, `asreml.Ainverse()` includes the genetic groups on top (top-left block) of <b>A\*</b>. Often models fitting this structure of <b>A\*</b> have trouble converging due to this matrix not being positive-definite (see *Appendix 6.3* and its subsections). One strategy to remedy this is to place the genetic groups on the bottom of <b>A\*</b>. Since there is no easy option to specify this in `asreml.Ainverse()` we recommend using `nadiv`'s function `makeAinv()` as demonstrated above.
Regardless of which package was used to create `listAstar`, this sparse inverse matrix in list format has two extra properties ('attributes') that are critical for being able to use it in an `asreml` model. First (and not specific to just genetic group models), all `asreml` ginverse objects must have the "rowNames" attribute to map row/column numbers back to the factor levels in the model variable. Second, specific to models fitting fixed genetic groups implicitly within the random effects the `listAstar` object must have the "geneticGroups" attribute where the first number indicates the number of unique genetic groups. This is essential for `asreml` to correctly calculate the residual degrees of freedom as part of the residual variance and log-likelihood calculations. These attributes are the last two rows of the internal object structure:
```{r asremlr_str_listAstar}
str(listAstar)
```
Both of these attributes are discussed in the help file to `makeAinv()` and can be accessed by typing `?makeAinv` in `R`. Now the object `listAstar` can be associated with the individual identities (`id`) in the model by supplying the list to the `ginverse` argument in the `asreml()` function.
```{r asremlr_noneval_ggAstar, eval = FALSE}
ggAstarASR <- asreml(fixed = p ~ f,
random = ~ ped(id),
ginverse = list(id = listAstar),
data = ggTutorial)
```
```{r asremlr_load_ggAstar, include = FALSE}
load(file = "./ASRemlR/ggAstarASR")
```
The estimate of the genetic group effects are included with the random effect predictions. Since we directed `makeAinv()` to position the genetic groups on the bottom of `Astar`, the genetic group effects are the last two predicted values. The variance of the estimated genetic group effects are included with the variances of the random effect predictions, which are reported on `asreml`'s underlying transformed scale. To re-scale the variances of the fixed effects and genetic group effects back to the phenotypic scale and report these as standard errors along with the estimates themselves:
```{r asremlr_rfxEstimates}
# find the location of the genetic groups
(ggrows <- match(c("foc0", "g1"), attr(listAstar, "rowNames")))
# genetic group and fixed effects (with re-scaled standard errors)
with(ggAstarASR, cbind(Effect = c(coefficients$fixed, coefficients$random[ggrows]),
seEffect = sqrt(c(vcoeff$fixed, vcoeff$random[ggrows]) * sigma2)))
```
Note the software has set the intercept estimate to zero in this model (see discussion of estimability in *Appendix S6.2*), implying this is the reference to which the `foc0` and `g1` genetic group effects are the estimated deviations. Therefore, the predicted total additive genetic effects (**u**) include the overall phenotypic mean. However, the difference between the genetic group effects `r round(ggAstarASR$coefficients$random[ggrows[2]] - ggAstarASR$coefficients$random[ggrows[1]], 3)` agrees with the expected difference between the mean breeding values of the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The predicted total additive genetic effects (**u**) can be accessed from `ggAstarASR$coefficients$random[-ggrows]`. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the breeding values (**a**) can be calculated without any further manipulation as (eqn 6 in main text):
```{r asremlr_calcA}
Astar_gHat <- matrix(ggAstarASR$coefficients$random[ggrows], ncol = 1)
Astar_a <- ggAstarASR$coefficients$random[-ggrows] - Q %*% Astar_gHat
```
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (**u**) and used to calculate breeding values (**a**). The `R` function `match()` is particularly helpful in these cases.
#### 6.4.3.4 Random implicit genetic group effects with A* (from `nadiv`)
Fitting random effects of genetic groups implicity within the random effects of an animal model requires the augmented inverse relatendess matrix (<b>A\*</b>) to be supplied as a list in the `ginverse` argument. First, we create the sparse matrix of <b>A\*</b> for the `ggPed` pedigree using the `makeAinv` function in the `nadiv` package, then make the necessary alterations (see discussion of this in *Appendix S5* and cautions/considerations in *Appendix 6.3.2*) so the model interprets the genetic groups as random effects, convert the sparse matrix to a list, and fit the model.
```{r asremlr_hidden_makeAstarRan, include = FALSE}
Astar <- makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$Ainv
```
```{r asremlr_noneval_makeAstarRan, eval = FALSE}
Astar <- makeAinv(ggPed, ggroups = 2, gOnTop = FALSE)$Ainv
```
In the above code, we directed `makeAinv()` to include the genetic groups on bottom (bottom-right block) of <b>A\*</b> using `gOnTop = FALSE`. This was done to improve model convergence (see *Appendix 6.3* and its subsections).
Next we add a small value to the diagonal elements of the group-group portion of the matrix taking the same approach as in the `MCMCglmm` tutorial above (*Appendix 6.4.2.3*). As a result of this alteration, the genetic group effects are conceptually random effects in the model. Below, we add 0.1 to the diagonal element of the <b>A\*</b> which will constrain the variance in genetic group effects to be one tenth the additive genetic variance estimated by the model. A value of 0.1 may bias estimates of additive genetic variance in models of other datasets. Therefore, we advise testing values other than 0.1 (e.g., 1 or 10) and comparing estimates from these alternative models. Note that a value of 1 constrains the variance in genetic group effects to be equal to the additive genetic variance estimated by the model.
The easiest approach is to adjust the genetic group diagonals of <b>A\*</b>, and this is why we stored the matrix returned by `makeAinv()` above (i.e., object `Ainv` in the list). Once the additions are made, we can then convert the matrix to the list format required by `asreml`.
```{r asreml_Astar_plusI}
AstarAdd <- Astar
(ggrows <- match(c("foc0", "g1"), dimnames(AstarAdd)[[1]]))
diag(AstarAdd)[ggrows] <- diag(AstarAdd)[ggrows] + 0.1
Astar[ggrows, ggrows]
AstarAdd[ggrows, ggrows]
# convert to the list format using `nadiv::sm2list()`
listAstarAdd <- sm2list(AstarAdd, rownames = rownames(AstarAdd))
# Should be NO "geneticGroups" attribute (or element 1 of this attribute must==0)
str(listAstarAdd)
```
The object `listAstarAdd` can be associated with the individual identities (`id`) in the model by supplying the list to the `ginverse` argument in the `asreml()` function.
```{r asremlr_noneval_ggAstarRan, eval = FALSE}
ggAstarRanASR <- asreml(fixed = p ~ f,
random = ~ ped(id),
ginverse = list(id = listAstarAdd),
data = ggTutorial)
```
```{r asremlr_load_ggAstarRan, include = FALSE}
load(file = "./ASRemlR/ggAstarRanASR")
```
The predictions of the genetic group effects are included with the random effect predictions. Since we directed `makeAinv()` to position the genetic groups on the bottom of `Astar`, the genetic group effects are the last two predicted values. The variance of the random effect predictions are reported on `asreml`'s underlying transformed scale. To re-scale the variances of the genetic group predictions back to the phenotypic scale and report these as standard errors along with the predictions themselves and the fixed effect estimates (for comparison to other models):
```{r asremlr_rfxEstimatesRan}
# find the location of the genetic groups
(ggrows <- match(c("foc0", "g1"), attr(listAstarAdd, "rowNames")))
# genetic group and fixed effects (with re-scaled standard errors)
with(ggAstarRanASR, cbind(Effect = c(coefficients$fixed, coefficients$random[ggrows]),
seEffect = sqrt(c(vcoeff$fixed, vcoeff$random[ggrows]) * sigma2)))
```
Note the intercept in this model is estimable and so the model returns this estimate along with predicting both of the genetic group effects. The genetic group predictions, themselves, are random effects and thus are deviations from their expected value of 0. The difference between the genetic groups `r round(ggAstarRanASR$coefficients$random[[ggrows[2]]] - ggAstarRanASR$coefficients$random[[ggrows[1]]], 3)` agrees with the expected difference between the mean breeding values of the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The predicted total additive genetic effects (**u**) can be accessed from `ggAstarRanASR$coefficients$random[-ggrows]`. Here, the prediced total additive genetic effects include the genetic group effects expressed as deviations from 0. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the breeding values (**a**) can be calculated without any further manipulation as (eqn 6 in main text):
```{r asremlr_calcAran}
AstarRan_gHat <- matrix(ggAstarRanASR$coefficients$random[ggrows], ncol = 1)
AstarRan_a <- ggAstarRanASR$coefficients$random[-ggrows] - Q %*% AstarRan_gHat
```
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (**u**) and used to calculate breeding values (**a**). The `R` function `match()` is particularly helpful in these cases.
----------------------------------------------------------------
<!------------------------------------------------------------->
<!--------- ------------ ----------- ------------->
### 6.4.4 ASReml Standalone
Here we cover four approaches to fit genetic groups in an **ASReml** analysis. The first two use `nadiv` functions in `R` to create **Q** and <b>A\*</b>, which are each separately included in **ASReml** models. The third approach uses **ASReml** arguments and qualifiers to rely entirely upon the **ASReml** capabilities for fitting genetic groups. Whereas the first three approaches are used to fit genetic groups as fixed effects, the final approach demonstrates how to fit random effects of genetic groups.
The preparation of the `ggTutorial` data and requisite outputs from `nadiv` in `R` will first be demonstrated before detailing the code used in the **ASReml** model fitting. However, the basic files necessary to fit the **ASReml** models without any preparation in `R` are available in the accompanying supporting files on Dryad (Wolak & Reid 2016). The content of these files is summarized as:
* `ggTutorial.dat` contains the `ggTutorial` data. This file contains space separated columns from a subset of columns in the original `data.frame` in the following order: `id`, `p`, `is`, `gen`, `f`, `foc0`, and `g1`. This is needed for all models fitted.
* `reg.ped` contains the pedigree and is analogous to the first three columns of `ggTutorial`. This is needed to fit a model with genetic group effects explicitly as separate fixed regressions in the model.
* `nadivAstar.giv` and `nadivAstar.txt` are needed to fit genetic group effects implicitly within the random effects of the model using `nadiv`'s `makeAinv()` in `R` to create <b>A\*</b> (contained in `nadivAstar.giv`). `nadivAstar.txt` provides the row names for `nadivAstar.giv` and is needed to associate levels in `nadivAstar.giv` to identities in the data `ggTutorial.dat`. Further explanation is provided below where these files are created.
* `asremlGG.ped` contains the genetic group pedigree. It is the same as `reg.ped` except two rows are added for each of the two genetic groups and individuals do not have missing values, but genetic groups, specified for missing dams and sires. This is needed to fit genetic group effects implicitly within the random effects using **ASReml**'s own capabilities.
* `ggReg.as`, `ggNadivAstarFxd.as`, `ggAsremlAstarFxd.as`, and `ggAsremlRan.as` are included within the folders with similar names. Each specifies a different model to run.
To prepare the data for an animal model, all that is necessary is to calculate the coefficients of inbreeding (*f*; to minimize bias from inbreeding depression, see the last paragraph of *Appendix 6.4.1*). This calculation can be done using the `makeAinv()` function in the `nadiv` package.
```{r asreml_packages, eval = FALSE}
library(nadiv)
```
```{r asreml_makeF}
ggTutorial$f <- makeAinv(ggTutorial[, 1:3])$f
```
#### 6.4.4.1 Preparing a pedigree with genetic groups
Calculating the matrix of genetic group contributions (**Q**) and the augmented inverse relatedness matrix (<b>A\*</b>) requires a pedigree that has genetic groups indicated instead of missing values for parents. In the `ggTutorial` data, we will assign all individuals with unknown parents in the first generation phantom parents from one genetic group (`foc0`). All individuals in subsequent generations that have unknown parents will be assigned phantom parents from a second genetic group (`g1`). Alternatively, this second group could be further divided into genetic groups based on generation. However, the data were simulated such that every immigrant has the same expected mean breeding value, regardless of the generation in which it was born. Therefore, we will stick to a total of two genetic groups (but, feel free to model more groups!).
First, create an object that will be the pedigree with genetic groups (`ggPed`).
```{r asreml_makePed}
ggPed <- ggTutorial[, c("id", "dam", "sire", "is", "gen")]
naPar <- which(is.na(ggPed[, 2]))
ggPed$GG <- rep("NA", nrow(ggPed))
# 'focal' genetic group = "foc0" and 'immigrant' = "g1"
# obtained by pasting "foc" & "g" with immigrant status "0" or "1", respectively
ggPed$GG[naPar] <- as.character(ggPed$is[naPar])
ggPed$GG[ggPed$GG == "0"] <- paste0("foc", ggPed$GG[ggPed$GG == "0"])
ggPed$GG[ggPed$GG == "1"] <- paste0("g", ggPed$GG[ggPed$GG == "1"])
ggPed[naPar, 2:3] <- ggPed[naPar, "GG"]
# Two rows need to be added for the genetic groups
## Genetic groups will be given the missing value 0 instead of parent identities
ggPed <- data.frame(id = c("foc0", "g1", as.character(ggPed$id)),
dam = c(0, 0, as.character(ggPed$dam)),
sire = c(0, 0, as.character(ggPed$sire)))
head(ggPed)
```
#### 6.4.4.2 Fixed explicit genetic group effects with Q (from `nadiv`)
Fitting genetic group effects explicitly requires the columns of **Q** to be included as separate fixed covariate regressions. The code below creates **Q** for the `ggPed` pedigree.
<!--- #FIXME --- ---->
```{r asreml_noneval_makeQ, eval = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
```
```{r asreml_silent_makeQ, include = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
pedigree <- ggTutorial[, c("id", "dam", "sire")]
naPar <- which(is.na(pedigree[, 2]))
pedigree[naPar, 2:3] <- 0
```
```{r asreml_hidden_genPedWarning, echo = FALSE}
nPed <- numPed(pedigree)
```
<!--- #END FIXME ---- ---- ---->
Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example.
Now we add the columns of **Q** (`foc0` and `g1`) as variables in `ggTutorial` so that they can be included in a model.
```{r asreml_addGGfxdReg, eval = FALSE}
ggTutorial <- cbind(ggTutorial, Q)
```
```{r asreml_addGGfxdReg2}
str(ggTutorial)
```
An important point to make is that, because the pedigree and data for `ggTutorial` have essentially the same sizes and structures (only `ggPed` now has two extra rows for the genetic groups - but these will be dropped automatically from **Q** by `ggcontrib()`), the genetic group coefficients from **Q** can be added directly to the data with `ggTutorial <- cbind(ggTutorial, Q)` above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the **Q** matrix contains a row for every individual in the pedigree, only the subset of rows in **Q** that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Now we need to write the pedigrees and portion of the data to files in formats that **ASReml** can use. Note, **ASReml** expects pedigree columns ordered ID, Sire, Dam and we will use '0' to denote missing values (instead of NA).
```{r asreml_ggreg_makePed}
regPed <- ggTutorial[, 1:3]
regPed[is.na(regPed[, 2]), 2] <- 0
regPed[is.na(regPed[, 3]), 3] <- 0
```
```{r asreml_ggreg_writePedDat, eval = FALSE}
write.table(regPed[, c(1,3,2)], "./asreml/ggReg/reg.ped",
row.names = FALSE, quote = FALSE)
asData <- ggTutorial[, -c(2:7)]
write.table(asData, "./asreml/ggReg/ggTutorial.dat",
row.names = FALSE, quote = FALSE)
```
The fixed regressions on `foc0` and `g1` can be combined with the standard <b>A<sup>-1</sup></b> to specify an animal model in **ASReml** (see also the "./asreml/ggReg" folder in the supporting files). Because unique solutions to the model do not exist (see discussion of estimability in *Appendix S6.2*) for the intercept, coefficient of inbreeding (*f*), and genetic group effects (`foc0` and `g1`), we fit the `foc0` genetic group as the last fixed effect. This ensures that the model sets `foc0` as a reference group and estimates the `g0` genetic group effect as a deviation from the mean breeding value in the `foc0` group.
```
Fixed regression for Explicit genetic groups
id !P
p
is !I 0 1
gen 15
f
foc0
g1
reg.ped !ALPHA !SKIP 1 !MAKE !GIV !DIAG
ggTutorial.dat !SKIP 1
p ~ mu f g1 foc0 !r id
```
The estimate of the genetic group effects and standard errors are reported as the genetic group fixed effect regression coefficients and standard errors in the "ggReg.sln" file. These can be read back into `R`:
```{r read_asremlSLNfile}
regPred <- read.table("./asreml/ggReg/ggReg.sln", header = TRUE)
```
so that we can see the fixed effect estimates and their standard errors:
```{r asreml_estGGreg}
regPred[1:4, ]
```
Note the software has set the `foc0` group estimate to zero in this model, implying this is the reference group and the `g1` genetic group effect is the estimated deviation from this reference. The above estimate of `r round(regPred[2, "Effect"], 3)` agrees with the expected difference between mean breeding value in the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The predicted breeding values (**a**) are also reported in the "ggReg.sln" file (see now `R` object `regPred`) and follow below the fixed effect estimates. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the total additive genetic effects (**u**) can be calculated without any further manipulation as (eqn 5 in main text):
```{r asreml_calcU}
reg_gHat <- matrix(regPred[1:2, "Effect"], ncol = 1)
reg_u <- Q %*% reg_gHat + regPred[-c(1:4), "Effect"]
```
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the breeding values (**a**). The `R` function `match()` is particularly helpful in these cases.
#### 6.4.4.3 Fixed implicit genetic group effects with A* (from `nadiv`)
Genetic group effects can be fit implicitly within the random effects when the augmented inverse relatedness matrix <b>A\*</b> has been supplied as a list to **ASReml**. The code below creates <b>A\*</b> for the `ggPed` pedigree and saves it as a list in the format **ASReml** requires.
First, we will calculate **Q** and include it in the dataset for consistency with other versions of this data.
<!--- #FIXME --- ---->
```{r asreml_astarNadiv_noneval_makeQ, eval = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
```
```{r asreml_astarNadiv_hidden_makeQ, include = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
pedigree <- ggTutorial[, c("id", "dam", "sire")]
naPar <- which(is.na(pedigree[, 2]))
pedigree[naPar, 2:3] <- 0
```
```{r asreml_astarNadiv_genPedWarning, echo = FALSE}
nPed <- numPed(pedigree)
```
<!--- #END FIXME ---- ---- ---->
Note that the two warnings above (regarding zeroes being interpreted as missing dams/sires) are to be expected and should not be cause for any concern in this particular pedigree/example.
Add the columns of **Q** (`foc0` and `g1`) as variables in `ggTutorial`.
```{r asreml_astarNadiv_addGG, eval = FALSE}
ggTutorial <- cbind(ggTutorial, Q)
```
```{r asreml_astarNadiv_addGG2}
str(ggTutorial)
```
An important point to make is that, because the pedigree and data for `ggTutorial` have essentially the same sizes and structures (only `ggPed` now has two extra rows for the genetic groups - but these will be dropped automatically from **Q** by `ggcontrib()`), the genetic group coefficients from **Q** can be added directly to the data with `ggTutorial <- cbind(ggTutorial, Q)` above. However, whenever a pedigree contains more individuals than the data (or in a different order), an intermediate step is required. Specifically, since the **Q** matrix contains a row for every individual in the pedigree, only the subset of rows in **Q** that correspond to phenotyped individuals in a dataset should be extracted and ordered according to the identities in the data.
Now we create the sparse matrix of <b>A\*</b> for the `ggPed` pedigree using the `makeAinv` function in the `nadiv` package.
```{r asreml_hidden_nadiv_makeAstar, include = FALSE}
listAstar <- makeAinv(ggPed, ggroups = 2, gOnTop = TRUE)$listAinv
```
```{r asreml_uneval_nadiv_makeAstar, eval = FALSE}
listAstar <- makeAinv(ggPed, ggroups = 2, gOnTop = TRUE)$listAinv
```
In the above code, we directed `makeAinv()` to include the genetic groups on top (top-left block) of <b>A\*</b> using `gOnTop = TRUE`. This is necessary so that **ASReml** solves the equations in an order that improves model convergence (see *Appendix 6.3* and its subsections). In this case, the **ASReml** model converges. In other pedigrees, models, and/or genetic group classifications, if singularities occur then sometimes this can be overcome by adding one (or a small value) to the diagonal elements of the group-group portion of the matrix (although see discussion of this in *Appendix S5* and cautions/considerations in *Appendix 6.3.2*). The alteration is demonstrated above in the `asreml` tutorial along with cautions and interpretations (*Appendix 6.4.3.4*). The resulting list could then be given to the standalone **ASReml** program much the same as we do here (see also *Appendix 6.4.4.5*). Note, the easiest approach is to adjust the genetic group diagonals of <b>A\*</b> using the matrix returned by `makeAinv()` (i.e., object `Ainv` in the list), then convert the matrix to the list format required by **ASReml**.
Now we need to write the data (`ggTutorial`) and list of the <b>A\*</b> (`listAstar`) to a file for **ASReml**. Further, we need to write to a file all of the identity codes in the pedigree that match to the rows and columns in `listAstar`. This enables **ASReml** to match levels in the generalized inverse (row and column numbers of `listAstar`) to observations in `ggTutorial`. Critically, we need to include the "geneticGroups" attribute of `listAstar` as the first line in the file containing this list.
```{r asreml_astarNadiv_writePedDat, eval = FALSE}
# Write the nadiv Astar ginv list
write.table(listAstar,
file = "./asreml/ggNadivAstarFxd/nadivAstar.giv",
col.names = FALSE, row.names = FALSE)
fConn <- file("./asreml/ggNadivAstarFxd/nadivAstar.giv", "r+")
Lines <- readLines(fConn)
# Add the !GROUPSDF *n* qualifier as the first line of the file
ngrps <- attr(listAstar, "geneticGroups")[1]
writeLines(c(paste0("!GROUPSDF ", ngrps), Lines), con = fConn)
close(fConn)
# Create mapping between order of identities in listAstar and their unique codes
write.table(ggPed[, 1], file = "./asreml/ggNadivAstarFxd/nadivAstar.txt",
col.names = FALSE, row.names = FALSE, quote = FALSE)
# Write the data
asData <- ggTutorial[, -c(2:7)]
write.table(asData, "./asreml/ggNadivAstarFxd/ggTutorial.dat",
row.names = FALSE, quote = FALSE)
```
Now we can run the animal model in **ASReml** using the supplied general inverse matrix `listAstar` in the file "nadivAstar.giv" (see also the "./asreml/ggNadivAstarFxd" folder in the supporting files):
```
nadiv's Astar for fixed genetic group effects implicitly within the random effects
id !A !L nadivAstar.txt
p
is !I 0 1
gen 15
f
foc0
g1
nadivAstar.giv
ggTutorial.dat !SKIP 1
!LAST id 2
p ~ mu f !r giv1(id)
```
Here, genetic group effects are fitted in the model using `nadiv`'s generalized inverse by including `giv1(id)` in the random effects syntax statement. The line below the data file, but before the model statement, (`!LAST id 2`) tells **ASReml** to fit the first two equations associated with the `id` variable (the genetic groups) last. This is done to help the model to converge (see *Appendix 6.3.1*).
The genetic group predictions and standard errors are reported in the "ggNadivAstarFxd.sln" file. These can be read back into `R`:
```{r read_asremlSLNfile_nadivImpFxd}
nadivAstarFxdPred <- read.table("./asreml/ggNadivAstarFxd/ggNadivAstarFxd.sln", header = TRUE)
```
so that we can see the fixed effect estimates and their standard errors along with the genetic group predictions:
```{r asreml_estGGFxdNadiv}
nadivAstarFxdPred[1:4, ]
```
Note the software has set the intercept (`mu`) estimate to zero in this model (see discussion of estimability in *Appendix S6.2*), implying this is the reference to which the `foc0` and `g1` genetic group effects are the estimated deviations. Therefore, the predicted total additive genetic effects (**u**) include the overall phenotypic mean. However, the difference between the genetic group effects `r round(nadivAstarFxdPred[4, "Effect"] - nadivAstarFxdPred[3, "Effect"], 3)` agrees with the expected difference between the mean breeding values of the immigrant group (`3`) and the founder population (`0`) that was specified in the simulation.
The predicted total additive genetic effects (**u**) are also reported in the "ggNadivAstarFxd.sln" file (see now `R` object `nadivAstarFxdPred`) and follow below the genetic group predictions. Because the order of rows in **Q** matches the order of individual identities in `ggTutorial`, the predicted breeding values (**a**) can be calculated without any further manipulation as (eqn 6 in main text):
```{r asreml_calcAFxdNadiv}
AstarFxd_gHat <- matrix(nadivAstarFxdPred[3:4, "Effect"], ncol = 1)
AstarFxd_a <- nadivAstarFxdPred[-c(1:4), "Effect"] - Q %*% AstarFxd_gHat
```
Note, if more individuals are in the pedigree than the dataset and/or the identities in the pedigree and data are ordered differently, more coding is necessary to ensure that the predicted genetic effect for a particular individual is added to the correct weighted genetic group contribution. Additionally, if more random effects are included in the model, care has to be taken in order to ensure the correct random effect predictions are extracted for the total additive genetic effects (**u**) and used to calculate breeding values (**a**). The `R` function `match()` is particularly helpful in these cases.
#### 6.4.4.4 Fixed implicit genetic group effects with A* (from ASReml)
**ASReml** can fit genetic group effects implicitly within the random effects using its own function to construct <b>A\*</b>. First, we construct the data and pedigree files in `R`.
Calculate **Q** and include it in the dataset for consistency with other versions of this data.
<!--- #FIXME --- ---->
```{r asreml_asrAstar_noneval_makeQ, eval = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
```
```{r asreml_asrAstar_hidden_makeQ, include = FALSE}
Q <- ggcontrib(ggPed[, 1:3])
pedigree <- ggTutorial[, c("id", "dam", "sire")]
naPar <- which(is.na(pedigree[, 2]))
pedigree[naPar, 2:3] <- 0
```
```{r asreml_asrAstar_genPedWarn, echo = FALSE}
nPed <- numPed(pedigree)
```