diff --git a/.nojekyll b/.nojekyll index eb958c4..bcc76ef 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f9aa7cb2 \ No newline at end of file +c9874b78 \ No newline at end of file diff --git a/mod_stats.html b/mod_stats.html index e0d57a4..40ec784 100644 --- a/mod_stats.html +++ b/mod_stats.html @@ -357,7 +357,6 @@

On this page

  • Multi-Model Inference
  • Meta-Analysis
  • @@ -415,13 +414,15 @@

    Needed Packages

    # Note that these lines only need to be run once per computer
     ## So you can skip this step if you've installed these before
     install.packages("tidyverse")
    -install.packages("lmerTest")
    +install.packages("lmerTest") +install.packages("palmerpenguins")

    We’ll go ahead and load some of these libraries as well to be able to better demonstrate these concepts.

    # Load needed libraries
     library(tidyverse)
    -library(lmerTest)
    +library(lmerTest) +library(palmerpenguins)
    @@ -735,12 +736,95 @@

    Multi-Model Infere
  • X, W, and Z together explain the most variation in Y
  • We might also fit other candidate models for pairs of X, W, and Z but for the sake of simplicity in this hypothetical we’ll skip those. Note that for this method to be appropriate you need to fit the same type of model in all cases!

    -

    Once we’ve fit all of our models and assigned them to objects, we can use the AIC function included in base R to compare the AIC score of each model. “AIC” stands for Akaike (AH-kuh-ee-kay) Information Criterion and is one of several related information criteria for summarizing a model’s explanatory power. The lowest AIC best explains the data and models with more parameters are penalized to make it mathematically possible for a model with fewer explanatory variables to still do a better job capturing the variation in the data. Technically any difference in AIC indicates model improvement but many scientists use a rule of thumb of a difference of 2. So, if two models have AIC scores that differ by less than 2, you can safely say that they have comparable explanatory power. That is definitely a semi-arbitrary threshold but so is the 0.05 threshold for p-value “significance”.

    -
    -

    Philosophical Note: Model Weights & Model Averaging

    -
    +

    Once we’ve fit all of our models and assigned them to objects, we can use the AIC function included in base R to compare the AIC score of each model. “AIC” stands for Akaike (AH-kuh-ee-kay) Information Criterion and is one of several related information criteria for summarizing a model’s explanatory power. Models with more parameters are penalized to make it mathematically possible for a model with fewer explanatory variables to still do a better job capturing the variation in the data.

    +

    The model with the lowest AIC best explains the data. Technically any difference in AIC indicates model improvement but many scientists use a rule of thumb of a difference of 2. So, if two models have AIC scores that differ by less than 2, you can safely say that they have comparable explanatory power. That is definitely a semi-arbitrary threshold but so is the 0.05 threshold for p-value “significance”.

    AIC Case Study

    +

    Let’s check out an example using AIC to compare the strengths of several models. Rather than using simulated data–as we did earlier in the mixed-effect model section–we’ll use some real penguin data included in the palmerpenguins package.

    +

    This dataset includes annual data on three penguin species spread across several islands. The sex of the penguins was also recorded in addition to the length of their flippers, body mass, and bill length and depth.

    +

    For the purposes of this example, our research question is as follows: what factors best explain penguin body mass?

    +
    +
    # Load the penguins data from the `palmerpenguins` package
    +data(penguins)
    +
    +# Make a version where no NAs are allowed
    +peng_complete <- penguins[complete.cases(penguins), ]
    +
    +# Check the structure of it
    +dplyr::glimpse(peng_complete)
    +
    +
    +
    1
    +
    +This is a base R way of keeping only rows that have no NA values in any column. It is better to identify and handle NAs more carefully but for this context we just want to have the same number of observations in each model +
    +
    +
    +
    +
    Rows: 333
    +Columns: 8
    +$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
    +$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
    +$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
    +$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
    +$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
    +$ body_mass_g       <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
    +$ sex               <fct> male, female, female, female, male, female, male, fe…
    +$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
    +
    +
    +

    With our data in hand and research question in mind, we can fit several candidate models that our scientific intuition and the published literature support as probable then compare them with AIC.

    +
    +
    # Species and sex
    +mod_spp <- lm(body_mass_g ~ species + sex, data = peng_complete)
    +
    +# Island alone
    +mod_isl <- lm(body_mass_g ~ island, data = peng_complete)
    +
    +# Combination of species and island
    +mod_eco <- lm(body_mass_g ~ island + species + sex, data = peng_complete)
    +
    +# Body characteristics alone
    +mod_phys <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm,
    +               data = peng_complete)
    +
    +# Global model
    +mod_sink <- lm(body_mass_g ~ island + species + sex +
    +               flipper_length_mm + bill_length_mm + bill_depth_mm,
    +               data = peng_complete)
    +
    +
    +
    1
    +
    +We’ve named the global model “sink” because of the American idiom “everything but the kitchen sink.” It is used in cases where everything that can be included has been +
    +
    +
    +
    +

    Once we’ve fit all of these models, we can use the AIC function from base R (technically from the stats package included in base R).

    +
    +
    # Compare models
    +AIC(mod_spp, mod_isl, mod_eco, mod_phys, mod_sink) %>% 
    +  dplyr::arrange(AIC)
    +
    +
    +
    1
    +
    +Unfortunately, the AIC function doesn’t sort by AIC score automatically so we’re using the arrange function to make it easier for us to rank models by their AIC scores +
    +
    +
    +
    +
             df      AIC
    +mod_sink 10 4727.242
    +mod_spp   5 4785.594
    +mod_eco   7 4789.480
    +mod_phys  5 4929.554
    +mod_isl   4 5244.224
    +
    +
    +

    Interestingly, it looks like the best model (i.e., the one that explains most of the data) is the global model that included most of the available variables. As stated earlier, it is not always the case that the model with the most parameters has the lowest AIC so we can be confident this is a “real” result. The difference between that one and the next (incidentally the model where only species and sex are included as explanatory variables) is much larger than 2 so we can be confident that the global model is much better than the next best.

    +

    With this result your interpretation would be that penguin body mass is better explained by a combination of species, sex, physical characteristics of the individual penguin, and the penguin’s home island than it is by any of the other candidate models. In a publication you’d likely want to report this entire AIC table (either parenthetically or in a table) so that reviewers could evaluate your logic.

    diff --git a/mod_stats_files/figure-html/mem-explore-graph-1.png b/mod_stats_files/figure-html/mem-explore-graph-1.png index 3be20c0..38e60e6 100644 Binary files a/mod_stats_files/figure-html/mem-explore-graph-1.png and b/mod_stats_files/figure-html/mem-explore-graph-1.png differ diff --git a/search.json b/search.json index 31beaba..c876e21 100644 --- a/search.json +++ b/search.json @@ -750,7 +750,7 @@ "href": "mod_stats.html#needed-packages", "title": "Analysis & Modeling", "section": "Needed Packages", - "text": "Needed Packages\nIf you’d like to follow along with the code chunks included throughout this module, you’ll need to install the following packages:\n\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"lmerTest\")\n\nWe’ll go ahead and load some of these libraries as well to be able to better demonstrate these concepts.\n\n# Load needed libraries\nlibrary(tidyverse)\nlibrary(lmerTest)", + "text": "Needed Packages\nIf you’d like to follow along with the code chunks included throughout this module, you’ll need to install the following packages:\n\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"lmerTest\")\ninstall.packages(\"palmerpenguins\")\n\nWe’ll go ahead and load some of these libraries as well to be able to better demonstrate these concepts.\n\n# Load needed libraries\nlibrary(tidyverse)\nlibrary(lmerTest)\nlibrary(palmerpenguins)", "crumbs": [ "Phase III -- Execute", "Analysis & Modeling" @@ -783,7 +783,7 @@ "href": "mod_stats.html#multi-model-inference-1", "title": "Analysis & Modeling", "section": "Multi-Model Inference", - "text": "Multi-Model Inference\nRegardless of your choice of statistical test, multi-model inference may be an appropriate method to use to assess your hypothesis. As stated earlier, this frames your research question as a case of which variables best explain the data rather than the likelihood of the observed effect relating to any variable in particular.\nTo begin, it can be helpful to write out all possible “candidate models”. For instance, let’s say that you measured some response variable (Y) and several potential explanatory variables (X, W, and Z). We would then fit the following candidate models:\n\nX alone explains the most variation in Y\nW alone explains the most variation in Y\nZ alone explains the most variation in Y\nX, W, and Z together explain the most variation in Y\n\nWe might also fit other candidate models for pairs of X, W, and Z but for the sake of simplicity in this hypothetical we’ll skip those. Note that for this method to be appropriate you need to fit the same type of model in all cases!\nOnce we’ve fit all of our models and assigned them to objects, we can use the AIC function included in base R to compare the AIC score of each model. “AIC” stands for Akaike (AH-kuh-ee-kay) Information Criterion and is one of several related information criteria for summarizing a model’s explanatory power. The lowest AIC best explains the data and models with more parameters are penalized to make it mathematically possible for a model with fewer explanatory variables to still do a better job capturing the variation in the data. Technically any difference in AIC indicates model improvement but many scientists use a rule of thumb of a difference of 2. So, if two models have AIC scores that differ by less than 2, you can safely say that they have comparable explanatory power. That is definitely a semi-arbitrary threshold but so is the 0.05 threshold for p-value “significance”.\n\nPhilosophical Note: Model Weights & Model Averaging\n\n\nAIC Case Study", + "text": "Multi-Model Inference\nRegardless of your choice of statistical test, multi-model inference may be an appropriate method to use to assess your hypothesis. As stated earlier, this frames your research question as a case of which variables best explain the data rather than the likelihood of the observed effect relating to any variable in particular.\nTo begin, it can be helpful to write out all possible “candidate models”. For instance, let’s say that you measured some response variable (Y) and several potential explanatory variables (X, W, and Z). We would then fit the following candidate models:\n\nX alone explains the most variation in Y\nW alone explains the most variation in Y\nZ alone explains the most variation in Y\nX, W, and Z together explain the most variation in Y\n\nWe might also fit other candidate models for pairs of X, W, and Z but for the sake of simplicity in this hypothetical we’ll skip those. Note that for this method to be appropriate you need to fit the same type of model in all cases!\nOnce we’ve fit all of our models and assigned them to objects, we can use the AIC function included in base R to compare the AIC score of each model. “AIC” stands for Akaike (AH-kuh-ee-kay) Information Criterion and is one of several related information criteria for summarizing a model’s explanatory power. Models with more parameters are penalized to make it mathematically possible for a model with fewer explanatory variables to still do a better job capturing the variation in the data.\nThe model with the lowest AIC best explains the data. Technically any difference in AIC indicates model improvement but many scientists use a rule of thumb of a difference of 2. So, if two models have AIC scores that differ by less than 2, you can safely say that they have comparable explanatory power. That is definitely a semi-arbitrary threshold but so is the 0.05 threshold for p-value “significance”.\n\nAIC Case Study\nLet’s check out an example using AIC to compare the strengths of several models. Rather than using simulated data–as we did earlier in the mixed-effect model section–we’ll use some real penguin data included in the palmerpenguins package.\nThis dataset includes annual data on three penguin species spread across several islands. The sex of the penguins was also recorded in addition to the length of their flippers, body mass, and bill length and depth.\nFor the purposes of this example, our research question is as follows: what factors best explain penguin body mass?\n\n# Load the penguins data from the `palmerpenguins` package\ndata(penguins)\n\n# Make a version where no NAs are allowed\n1peng_complete <- penguins[complete.cases(penguins), ]\n\n# Check the structure of it\ndplyr::glimpse(peng_complete)\n\n\n1\n\nThis is a base R way of keeping only rows that have no NA values in any column. It is better to identify and handle NAs more carefully but for this context we just want to have the same number of observations in each model\n\n\n\n\nRows: 333\nColumns: 8\n$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…\n$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…\n$ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…\n$ body_mass_g <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…\n$ sex <fct> male, female, female, female, male, female, male, fe…\n$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n\n\nWith our data in hand and research question in mind, we can fit several candidate models that our scientific intuition and the published literature support as probable then compare them with AIC.\n\n# Species and sex\nmod_spp <- lm(body_mass_g ~ species + sex, data = peng_complete)\n\n# Island alone\nmod_isl <- lm(body_mass_g ~ island, data = peng_complete)\n\n# Combination of species and island\nmod_eco <- lm(body_mass_g ~ island + species + sex, data = peng_complete)\n\n# Body characteristics alone\nmod_phys <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm + bill_depth_mm,\n data = peng_complete)\n\n# Global model\n1mod_sink <- lm(body_mass_g ~ island + species + sex +\n flipper_length_mm + bill_length_mm + bill_depth_mm,\n data = peng_complete)\n\n\n1\n\nWe’ve named the global model “sink” because of the American idiom “everything but the kitchen sink.” It is used in cases where everything that can be included has been\n\n\n\n\nOnce we’ve fit all of these models, we can use the AIC function from base R (technically from the stats package included in base R).\n\n# Compare models\nAIC(mod_spp, mod_isl, mod_eco, mod_phys, mod_sink) %>% \n1 dplyr::arrange(AIC)\n\n\n1\n\nUnfortunately, the AIC function doesn’t sort by AIC score automatically so we’re using the arrange function to make it easier for us to rank models by their AIC scores\n\n\n\n\n df AIC\nmod_sink 10 4727.242\nmod_spp 5 4785.594\nmod_eco 7 4789.480\nmod_phys 5 4929.554\nmod_isl 4 5244.224\n\n\nInterestingly, it looks like the best model (i.e., the one that explains most of the data) is the global model that included most of the available variables. As stated earlier, it is not always the case that the model with the most parameters has the lowest AIC so we can be confident this is a “real” result. The difference between that one and the next (incidentally the model where only species and sex are included as explanatory variables) is much larger than 2 so we can be confident that the global model is much better than the next best.\nWith this result your interpretation would be that penguin body mass is better explained by a combination of species, sex, physical characteristics of the individual penguin, and the penguin’s home island than it is by any of the other candidate models. In a publication you’d likely want to report this entire AIC table (either parenthetically or in a table) so that reviewers could evaluate your logic.", "crumbs": [ "Phase III -- Execute", "Analysis & Modeling" diff --git a/sitemap.xml b/sitemap.xml index a3eb274..0bb1d25 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,102 +2,102 @@ https://lter.github.io/ssecr/mod_wrangle.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_version-control.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_interactivity.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_data-viz.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_next-steps.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_reports.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_template.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_findings.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_reproducibility.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/policy_usability.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_thinking.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/instructors.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.737Z https://lter.github.io/ssecr/policy_attendance.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_stats.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/CONTRIBUTING.html - 2024-05-30T15:42:40.048Z + 2024-05-31T19:48:17.701Z https://lter.github.io/ssecr/mod_team-sci.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/policy_pronouns.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/index.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.737Z https://lter.github.io/ssecr/policy_ai.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_spatial.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_credit.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_facilitation.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_project-mgmt.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/policy_conduct.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z https://lter.github.io/ssecr/mod_data-disc.html - 2024-05-30T15:42:40.088Z + 2024-05-31T19:48:17.741Z